On September 18, 2019 from 16:01 UTC to 19:13 UTC customers using Zendesk Talk on Pod 23 experienced dropped calls on Talk or the inability to answer a call. Call recordings may not have appeared or been available immediately.
19:40 UTC | 12:40 PT
We've confirmed that the issues causing a low percentage of dropped calls on Pod 23 are now resolved. We apologize for the disruption this caused.
19:03 UTC | 12:03 PT
Our team continues to investigate the cause of the low percentage of dropped calls on Pod 23. Most dropped calls are still going to voicemail to mitigate impact. We will provide another update in an hour.
18:29 UTC | 11:29 PT
We are continuing to investigate the cause for a low percentage of dropped calls on Pod 23. We will provide another update in 30 minutes.
17:56 UTC | 10:56 PT
We are currently investigating a small percentage of dropped calls on Pod 23. The majority of impacted calls are being sent to voicemail to mitigate impact. We will provide further information shortly.
Root Cause Analysis
This incident was caused by stale records in our infrastructure configuration datastore. A redeploy of our Talk product during the incident exacerbated the issue.
To fix this issue, we removed the saved state of the stale configuration data on our infrastructure configuration datastore in the filesystem. Restarting the affected service provided immediate improvement.
- Improve monitors and alerts on infrastructure configuration datastore errors
- Investigate dependencies and instrumentation as needed
- Fix logging gaps to add more visibility to Talk services
FOR MORE INFORMATION
For current system status information about your Zendesk, check out our system status page. During an incident, you can also receive status updates by following @ZendeskOps on Twitter. The summary of our post-mortem investigation is usually posted here a few days after the incident has ended. If you have additional questions about this incident, please log a ticket with us.