SUMMARY
On May 18, 2023 from 23:06 UTC to 23:57 UTC, Talk customers across all Pods experienced problems making outbound calls and accepting inbound calls.
Timeline
23:29 UTC | 16:29 PT
We are aware of issues with inbound and outbound calls in Talk. We are investigating this with our service provider and will update you when we have more information.
23:41 UTC | 16:41 PT
Our service provider has acknowledged that there is an ongoing call connectivity issue and they are working to fix it. We will update you in the next 60 minutes or as information becomes available.
19 May 2023, 00:07 UTC | 17:07 PT
We are seeing recovery at our end with test inbound and outbound calls working and receiving positive customer reports. We are awaiting updates from our service provider. Next update in 60 minutes or when we have more to share.
19 May 2023, 00:59 UTC | 17:59 PT
Inbound and outbound calls should be working for many customers now. We continue to await updates from our service provider. Thank you for your ongoing patience. Next update when we have more information to share.
19 May 2023, 01:35 UTC | 18:35 PT
Our service provider has reported recovery at their end and is monitoring to full recovery. We will provide another update once we have more to share. We appreciate your support and patience.
19 May 2023, 02:40 UTC | 19:40 PT
As reported by our service provider, the issue causing issues with inbound and outbound calls have recovered and the Talk service should now be working as intended. Thank you for your patience and partnership.
POST-MORTEM
Root Cause Analysis
This incident was caused by a scheduled maintenance deployment of our Talk service provider, which caused an unexpected tear down of all existing connections. As reconnections were attempted, the backend services overloaded and in turn new connections were unable to reconnect.
Resolution
To fix this issue, our service provider rolled back the change and scaled up resources to optimize the handling of the surge of reconnections.
Remediation Items
- Review monitoring and add alerts for events that would have indicated this issue.
FOR MORE INFORMATION
For current system status information about your Zendesk, check out our system status page. The summary of our post-mortem investigation is usually posted here a few days after the incident has ended. If you have additional questions about this incident, please log a ticket with us via ZBot Messaging within the Widget.
1 Comments
Post-mortem published 29 May, 2023
Article is closed for comments.