On January 30, 2020 from 09:00 UTC to 09:30 UTC, certain accounts on Pod 25 experienced problems where opening Zendesk Guide failed and returned a 500 “Server Error” page.
09:00 UTC | 01:00 PT
At 09:00 UTC We deployed a change to one of our load balancers, which was meant to increase routing speed for requests.
09:12 UTC | 01:12 PT
Our developers noticed an increase of 5xx errors in our monitoring tools.
09:28 UTC | 01:28 PT
Changes were rolled back.
09:30 UTC | 01:30 PT
Traffic to Zendesk Guide resumed as normal on Pod 25.
Root Cause Analysis
A DNS cache configuration change caused 5xx errors for Guide customers in Pod 25. Once the change was identified as the culprit, the change was rolled back and Guide traffic returned to normal.
Once the change was rolled back all Guide accounts were again accessible as normal.
Our team will be working on improvements to DNS configuration grouping methods for pods in the same geolocation, so that future balancer changes will not impact routing.
FOR MORE INFORMATION
For current system status information about your Zendesk, check out our system status page. During an incident, you can also receive status updates by following @ZendeskOps on Twitter. The summary of our post-mortem investigation is usually posted here a few days after the incident has ended. If you have additional questions about this incident, please log a ticket with us.