SUMMARY
On February 2nd, 2021 from 21:15 UTC to 22:32 UTC, Chat customers using Agent Workspace were unable to receive new chats and existing conversations only created tickets once the issue was resolved.
Timeline
23:20 UTC | 15:20 PT
Our teams have resolved the chat routing issues that were present across all pods. Thank you for your patience
22:33 UTC | 14:33 PT
We have identified the cause of chat routing failures across all pods and are working to resolve them. We are beginning to see improvement. We will provide more information as it becomes available.
22:16 UTC | 14:16 PT
We are investigating chat routing failures across all pods. We will provide additional information as soon as it is available.
Root Cause Analysis
This incident was caused by a deployed change to the DNS system associated with certain legacy elements of our Chat products infrastructure. As this update was rolled out, connections to parts of the service began to fail, in turn blocking new chats from being surfaced in the Agent Workspace dashboard.
Resolution
To fix this issue, our engineers reverted the change to process the queued jobs by 22:08 PM UTC.
Once the backlog was cleared, performance returned to normal as customers on Agent Workspace were able to receive new chats and create tickets from completed chats again.
Remediation Items
- Restructure legacy elements of our Chat system to avoid similar issues in the future
- Enhance our change management process, investigating failover options
- Improve our internal documentation.
FOR MORE INFORMATION
For current system status information about your Zendesk, check out our system status page. During an incident, you can also receive status updates by following @ZendeskOps on Twitter. The summary of our post-mortem investigation is usually posted here a few days after the incident has ended. If you have additional questions about this incident, please log a ticket with us.