On October 4, 2019 from 7:40 UTC to 8:09 UTC customers on Zendesk Support, Guide & Talk on Pod 19 experienced slow loading, errors, and degraded performance in Support and Guide as well as dropped calls in Talk.
08:21 UTC | 01:21 PT
We have successfully mitigated an issue whereby customers were briefly experiencing the inability to login to their Zendesk Support, Guide and Chat accounts on Pod 19. We are monitoring the issue. Please do reach out to email@example.com if you are still experiencing issues.
Root cause Analysis
Multiple similar types of Guide originated queries caused query pileups on the database reader nodes. A high number of long running queries resulted in high CPU load on the reader nodes. The services which were using reader nodes of the 19.8 cluster were affected because of high CPU load and latency. Implementing query killers resolved the issue.
- Add index to optimize slow running queries
- Improved logging and monitoring for queries
- Investigate usage of a circuit breaker when queries run slow
FOR MORE INFORMATION
For current system status information about your Zendesk, check out our system status page. During an incident, you can also receive status updates by following @ZendeskOps on Twitter. The summary of our post-mortem investigation is usually posted here a few days after the incident has ended. If you have additional questions about this incident, please log a ticket with us.