On October 23, 2023 from 14:26 UTC to 18:14 UTC, some Zendesk Support, Chat and Talk customers on Pod 13 experienced multiple issues including omni-channel routing errors, SLA update delays, ticket view errors and chat assignment delays.
15:14 UTC | 08:14 PT
We are investigating reports of errors and delays assigning new incoming queued Chats on Pod 13, resulting in some missed chats, as well as delays in some SLA calculations on tickets. We will provide further updates shortly.
15:28 UTC | 08:28 PT
We have confirmed an issue causing delays in chat assignment, SLA calculation in tickets, ticket views, and user deletion on Pod 13. Our team is investigating and we will continue to provide updates as we learn more.
15:51 UTC | 08:51 PT
In addition to the above, customers with Talk and omni-channel routing enabled on Pod 13 may see issues with incoming calls not routing to available agents. We will post further updates as the investigation progresses.
16:06 UTC | 09:06 PT
We are beginning to see some recovery from the varied symptoms seen in Pod 13 today. SLAs are beginning to process normally, views are starting to load properly, and chats and Talk calls are beginning to route as expected. Please let us know if you continue to experience any issues.
18:31 UTC | 11:31 PT
We have resolved the issue presenting a number of symptoms on Pod 13. Calls and chats through omni-channel routing are now routing as expected, SLAs are applying and updating accurately in tickets, views are loading as expected, and user deletion is working properly. Thank you for your patience during our investigation.
Root Cause Analysis
This incident was caused by a hardware failure of a cloud storage volume in our vendor’s infrastructure that prevented automatic recovery of the impacted volume.
To fix this issue, our engineers manually restarted the affected volume and service recovered soon after this action.
Please note: the backfill/restoration of data that was run to resolve broken SLAs on Open tickets had a side effect of completely removing SLA data on Closed tickets, which results in ‘Null’ SLA data in Explore.
- Monitoring and additional alerts to reduce error identification time [Scheduled]
- Explore runbook improvements to expeditiously resolve volume errors [Scheduled]
- Review on-call engineering resources to reduce investigation time [Scheduled]
FOR MORE INFORMATION
For current system status information about Zendesk and specific impacts to your account, visit our system status page. You can follow this article to be notified when our post-mortem report is published. If you have additional questions about this incident, contact Zendesk customer support.