SUMMARY
On May 17, 2022 from 17:21 UTC to 19:38 UTC, Zendesk Chat customers (both Agent Workspace and non-Agent Workspace) may have been impacted differently:
- Agents would be unable to see new chats assigned to them during the impact window if there were any;
- Agents would have been unable to serve chats if there were any in queue;
- Ongoing chats were not impacted.
Timeline
18:55 UTC | 11:55 PT
We are investigating reports of issues affecting Chat routing. We will provide another update as soon as we can.
19:13 UTC | 12:13 PT
We have confirmed an issue affecting Chat routing, causing delays and errors serving Chats. Our team is investigating and we will provide further updates as soon as we can.
19:59 UTC | 12:59 PT
Our engineers have identified the cause of the missed chats and Chat routing issues and taken steps to mitigate the issue. We are seeing improvement and continuing to monitor the situation. Please refresh your browser and let us know if the issue persists.
20:13 UTC | 13:13 PT
We are seeing sustained recovery and are now confident this issue is resolved. Please reach out if you have any further issues.
POST-MORTEM
Root Cause Analysis
This incident was caused by a sudden surge of chat requests from a single customer, overwhelming the system. This led to severe lags in the chat service that routed chat requests to online Chat agents.
Resolution
To fix this issue, we implemented a rate limit to the account in question reduced the visitors waiting in queue for that account, then restarted the chat service. We started to see recovery and continued to monitor until the system was fully recovered.
Remediation Items
- As part of incident resolution, we implemented a specific alert for Chat routing latency, which would provide an early warning of the problems. [Done]
- Improve capacity of the Chat Routing service so it can scale to handle traffic surges better [In Progress]
FOR MORE INFORMATION
For current system status information about your Zendesk, check out our system status page. The summary of our post-mortem investigation is usually posted here a few days after the incident has ended. If you have additional questions about this incident, please log a ticket with us via ZBot Messaging within the Widget.
1 Comments
Post-mortem published May 24th, 2022.
Article is closed for comments.