SUMMARY
On December 15, 2023, between 14:10 and 15:30 UTC, customers reported increased dropped calls and issues picking up calls as well as difficulties with placing calls on hold and making outbound calls. These issues peaked around 14:00 and 15:00 UTC.
Timeline
16:33 UTC | 08:33 PT (Dec 15)
We are investigating reports of dropped calls in Talk and other Talk feature degradation across multiple pods. We will post additional information shortly.
16:49 UTC | 08:49 PT (Dec 15)
We have confirmed an issue that is causing dropped calls and other Talk feature degradation across multiple pods. Our team is investigating and we will provide further updates as we learn more.
17:16 UTC | 09:16 PT (Dec 15)
Our team continues to investigate the Talk issue causing dropped calls and general Talk feature degradation across multiple pods. We will provide new information as the investigation progresses.
17:49 UTC | 09:49 PT (Dec 15)
We are working with our provider to continue investigating the Talk issue that is causing dropped calls and general Talk feature degradation across multiple pods. We will continue to post updates as we dig in further.
18:51 UTC | 10:51 PT (Dec 15)
We are still working with our Talk provider to help address the agent-side Talk feature degradation seen today. Agents may have issues connecting to calls; however, fully dropped call volume is relatively low, and calls should still reach other available agents. Our investigation continues and we will provide further updates as we learn more.
20:01 UTC | 12:01 PT (Dec 15)
We continue to work with our Talk provider to address agent-side Talk feature degradation that some agents have seen today. We will send further updates by end of day or when we have new information to share.
19:51 UTC | 11:51 PT (Dec 18)
After monitoring through the weekend we are not seeing any errors in agent-side Talk-related functionality. Continued investigation with our provider has not uncovered a common root cause among reports and findings indicate that the timing of reports may have been coincidental. As such we do not anticipate completing a formal RCA for this event. If you are experiencing any issues please contact our Support team to assist in troubleshooting.
POST-MORTEM
Root Cause Analysis
This incident was caused by an issue on our Talk service provider’s end. It wasn't until December 16 that they publicly acknowledged and shared details about the disruption in their Programmable Voice Services from the day before. Unfortunately, this information did not reach Zendesk until December 21. The timeline of the issues our customers faced aligns with the period of disclosed issues from our Talk service provider. The complications experienced by the Talk service provider instigated a transition into the Radarless Mode operation of our Call Console, a mode activated in response to poor network connectivity. This was directly linked to the Talk service provider's autoscaling failures, which commenced at 12:05 UTC on December 15.
Resolution
To fix this issue, our Talk service provider engineers from our Talk service provider updated their voice processing fleet to employ the latest configurations, thereby circumventing the incompatibility with the new data being processed. No additional action was required from Zendesk's side.
Remediation Items
From Zendesk
- Display an error message in the Call Console if the Talk service provider connection is not active, or at least a connection icon that indicates connection issues.
- Review and improve current error messages/in-product messaging for clarity.
- Improve existing implementation tools.
- Explore the development of additional alert systems for improved incident detection.
FOR MORE INFORMATION
For current system status information about your Zendesk, check out our system status page. The summary of our post-mortem investigation is usually posted here a few days after the incident has ended. If you have additional questions about this incident, please log a ticket with us via ZBot Messaging within the Widget.