SUMMARY
On September 10, 2024, from 22:57 UTC to 04:09 UTC on September 11, 2024, customers utilizing Zendesk Talk across all Pods faced considerable disruptions that hindered their ability to make and receive calls. This incident impacted agents in various locations and networks, resulting in missed calls and interruptions to their business operations. Many agents encountered the message “Sorry, Talk is unavailable right now” in the call console, accompanied by notifications of missed calls.
Timeline
September 11, 2024 01:20 AM UTC | September 10, 2024 06:20 PM PT
We are currently investigating issues with Talk calls across multiple regions. More information to come.
September 11, 2024 01:41 AM UTC | September 10, 2024 06:41 PM PT
We are working with our Talk partner to investigate the issues impacting the ability to make or receive calls. We will provide another update in the next 30 minutes or if we have more to share.
September 11, 2024 02:03 AM UTC | September 10, 2024 07:03 PM PT
Our Talk partner has acknowledged an issue and are investigating further at their end. We are continuing to work with them and will provide an update in another 30 minutes or when more information becomes available.
September 11, 2024 02:33 AM UTC | September 10, 2024 07:33 PM PT
Our Talk partner has identified a potential issue with the Voice SDK, and their engineers are actively working to resolve it. We have also begun to receive feedback on recovery. More details will be provided within the next 60 minutes.
September 11, 2024 03:01 AM UTC | September 10, 2024 08:01 PM PT
We can confirm that Talk calls should now be completing successfully for our customers. We will continue to monitor and provide a final update upon full resolution. Thank you.
September 11, 2024 04:10 AM UTC | September 10, 2024 09:10 PM PT
We are happy to report that the Talk call issues are now resolved. Thank you for your patience and understanding while our teams worked through today's issue.
POST-MORTEM
Root Cause Analysis
This incident was caused by a production deployment of code from our Talk partner that incorrectly parsed the Base 64 encoding of Capability Tokens when these tokens contained a '-' or '_' character in specific positions. As a result, a subset of customers using our Talk partner’s Voice SDK Capability Tokens were unable to connect, which led to their inability to make or receive Voice SDK calls. This parsing error resulted in connection failures for the affected customers, preventing them from using Zendesk Talk.
Resolution
To resolve this issue, the engineering team collaborated closely with our Talk partner to identify the faulty code deployment. Once the root cause was established, the team implemented a fix to ensure that Capability Tokens would be correctly parsed, allowing affected customers to successfully make and receive calls again.
Remediation Items
- Review voice service authentication strategy and migrate to alternative authentication method.
- Create additional alerts specifically for Talk partner device errors to facilitate quicker response times.
- Review and improve monitoring metrics to capture potential issues before they escalate into incidents.
Note: Our Talk partner will also have their own remediation items, which include better testing and monitoring.
FOR MORE INFORMATION
For current system status information about your Zendesk, check out our system status page. The summary of our post-mortem investigation is usually posted here a few days after the incident has ended. If you have additional questions about this incident, contact Zendesk customer support.
2 comments
Jessica G.
Post-mortem published September 27, 2024.
0
Dan Beirouty
Update: Remediation item #1 required a correction and has been changed.
0