SUMMARY
On June 4, 2020 from 17:01 UTC to 17:50 UTC, Support customers on pods 15 and 25 experienced increased error rates and degraded performance.
18:22 UTC | 11:22 PT
We have identified and resolved the issue causing elevated error rates. Pods 15 and 25 are now stable and error rates have dropped.
17:53 UTC | 10:53 PT
We are currently investigating an issue causing elevated error rates impacting Support on Pods 15 and 25. We will update you with more information shortly.
Root Cause Analysis
This incident was caused by a change to a database query and how it was executed, making the query slower and more CPU intensive. The excess load on the database caused cascading performance degradations and elevated errors.
Resolution
Once the query was identified, we were able to rollback and restore the database performance.
Remediation Items
- Investigating new analysis methods for query changes in production
- Update process for similar deploys to ensure swift rollback.
FOR MORE INFORMATION
For current system status information about your Zendesk, check out our system status page. During an incident, you can also receive status updates by following @ZendeskOps on Twitter. The summary of our post-mortem investigation is usually posted here a few days after the incident has ended. If you have additional questions about this incident, please log a ticket with us.
1 Comments
Postmortem published June 10, 2020.
Article is closed for comments.