On November 4, 2019 from 13:50 UTC to 16:13 UTC customers on Pod 18 experienced issues with Agent Collision in Support and tickets were not created or tickets did not auto-update with recordings for agents using Talk.
17:18 UTC | 09:18 PT
We are pleased to report the issue is now resolved. Please let us know if you’re still seeing issues so that we can look into your specific case more closely.
16:59 UTC | 08:59 PT
Our Team have taken steps to resolve the issues with agent-collision and Talk. We are seeing signs of improvement and will continue to monitor until those systems are free from errors that indicated these issues.
16:00 UTC | 08:00 PT
We are currently investigating an issue affecting Talk functionality and Play button usage on Pod 18. More updates to follow
Root Cause Analysis
This incident was caused by insufficient capacity on Pod 18 during a deploy to our agent collision service.
To fix this issue, we scaled clusters to increase capacity and rolled back the change.
- Add additional monitoring and alerting for memory usage
- Reevaluate deploy process to include relevant metrics review
- Investigate other pods for similar risk.
FOR MORE INFORMATION
For current system status information about your Zendesk, check out our system status page. During an incident, you can also receive status updates by following @ZendeskOps on Twitter. The summary of our post-mortem investigation is usually posted here a few days after the incident has ended. If you have additional questions about this incident, please log a ticket with us.