SUMMARY
On February 26, 2024 from 17:50 UTC to 20:52 UTC, a small subset of customers on Pod 29 experienced an issue that led to ticket emails to fail to be processed.
Timeline
18:24 UTC | 10:24 PT
We are investigating an issue that is impacting a subset of customers on Pod 29, causing access issues and preventing use of the product. We will provide further updates soon.
18:38 UTC | 10:38 PT
We have confirmed an issue impacting a subset of customers on Pod 29, causing green screen access errors and preventing use of all products. We will continue to provide updates as the investigation progresses.
19:03 UTC | 11:03 PT
Our team continues to investigate an issue impacting a subset of customers on Pod 29, causing green screen errors and preventing access to all products. We will post any new information as soon as we find it.
19:44 UTC | 11:44 PT
Our team is still working towards a root cause for the issue impacting a subset of customers on Pod 29, causing access issues and green screen errors. Further updates will be posted as we learn more.
20:25 UTC | 12:25 PT
Our engineers are continuing to work with our cloud service provider to identity root cause and work towards recovery. Next update will be posted in 1 hour or as soon as we have new information.
21:10 UTC | 13:10 PT
We are now seeing recovery and will continue to monitor performance until the issue is fully resolved. Next update when the issue is fully resolved.
22:26 UTC | 14:26 PT
We're happy to report that the issue is now fully resolved. Please let us know if you continue to experience issues.
POST-MORTEM
Root Cause Analysis
A defect in a particular feature of the database storage system caused the cluster to go offline.
Resolution
The problem was rectified by turning off the malfunctioning feature, after which the storage system regained its normal operational state.
Remediation Items
- Set up additional alerts. [Scheduled]
- Increaser retry window in case of failures. [Scheduled]
FOR MORE INFORMATION
For current system status information about your Zendesk, check out our system status page. The summary of our post-mortem investigation is usually posted here a few days after the incident has ended. If you have additional questions about this incident, please log a ticket with us via ZBot Messaging within the Widget.