On May 27, 2020 from 20:57 UTC to 21:07 UTC customers were unable to load Support, Guide, Talk, and Sunshine on Pod 17 receiving 502 bad gateway error messages: “Your request experienced a server error.”
22:19 UTC | 15:19 PT
Our team has resolved the issues affecting Pod 17 products. We’re now seeing consistent and expected performance.
21:30 UTC | 14:30 PT
We confirmed our services are restored to a stable state as of 21:07 UTC, and we are still monitoring this issue impacting performance on Pod 17. Thank you for your understanding.
21:16 UTC | 14:16 PT
We are currently investigating an issue impacting performance on Pod 17 impacting Support, Guide and Talk products. We will provide an update shortly.
Root Cause Analysis
This incident was caused when a change resulted in certificate errors on nodes attempting to contact the API, eventually resulting in some nodes failing.
To fix this issue, we reverted the change to restore service.
- Updated existing process and documentation,
- Added task to future changes to verify certificate bundle on all nodes before starting,
- Investigate further automation and alerts to prevent this in the future.
FOR MORE INFORMATION
For current system status information about your Zendesk, check out our system status page. During an incident, you can also receive status updates by following @ZendeskOps on Twitter. The summary of our post-mortem investigation is usually posted here a few days after the incident has ended. If you have additional questions about this incident, please log a ticket with us.