According to ChainCatcher, zkSync tweeted a detailed announcement of yesterday's zkSync Era network outage. Block production stopped due to a failure in the block queue database. Despite this, the server API was not affected. Transactions continued to be added to the memory pool and query services were normal. Although all components have comprehensive monitoring, logging, and alerts, no alerts were triggered because the API was operating normally. The fix was implemented within 5 minutes. To solve similar problems, zkSync gives database monitoring agents a special role that enables them to connect to the database and continuously collect metrics. An alert will be issued when the database monitoring agent fails or cannot establish a connection with the database to collect metrics.

Additionally, if the situation escalates, the on-call team will be notified immediately through multiple channels. But the only long-term solution for liveness and availability is decentralization. Decentralized systems are inherently more resilient, and decentralization of sequencers (and subsequently provers) is a top priority for the zkSync engineering team. (Source link)