Summary
On July 28, 2022 from 16:57 UTC to 18:27 UTC, customers in Pods 26 and 27 experienced increased error rates and access issues in all Zendesk products.
Timeline
21:18 UTC | 14:18 PT
The access issues and errors affecting pods 26 and 27 are now resolved. Platform speed and functionality should be fully restored. Please let us know if you continue to experience any issues.
19:09 UTC | 12:09 PT
Our team continues to monitor recovering error rates on Pods 26 and 27. Additional updates will be posted when we have new information to share.
18:38 UTC | 11:38 PT
We are beginning to see some improvement in the error rates affecting Pods 26 and 27. Our team is monitoring and we will post another update in the next 30 minutes.
18:01 UTC | 11:01 PT
Our team continues to investigate elevated error rates and access issues in the US-East region. We will post another update within the next 30 minutes.
17:33 UTC | 10:33 PT
We have confirmed access issues and high error rates in the US-East region. Further updates to come shortly.
17:18 UTC | 10:18 PT
We are investigating reports of errors across US-based accounts. More information to follow.
Root Cause Analysis
This incident was caused by a loss of power within a single Data Center in one of our hosting provider’s availability zones where Zendesk services are hosted. This issue resulted in the service outage experienced by customers on Pods 26 and 27.
Resolution
To fix this issue, Zendesk completed a failover to a different availability zone at 18:17 UTC for Pod 27 and 18:27 for Pod 26 UTC that resulted in service recovery for our customers. In the background, the hosting provider restored power to the affected Data Center that resolved the root cause of the issue.
Remediation Items
- Hosting Provider: Replace failed components in Data Center power distribution lineups [COMPLETED]
- Hosting Provider: Improve failure detection systems [IN PROGRESS]
- Improve Zendesk business continuity plan to third party vendor failures [IN PROGRESS]
FOR MORE INFORMATION
For current system status information about your Zendesk, check out our system status page. The summary of our post-mortem investigation is usually posted here a few days after the incident has ended. If you have additional questions about this incident, please log a ticket with us via ZBot Messaging within the Widget.