On July 27, 2021 from 05:24 UTC to 08:59 UTC, customers on Zendesk Chat experienced slowness, and intermittent disconnections were experienced with conversations in Agent Workspace.
08:20 UTC | 01:20 PT
We are receiving reports of Issues with conversations not loading and chats being disconnected in Agent Workspace. Investigation is underway.
08:49 UTC | 01:49 PT
Our engineering team continues to investigate issues impacting Agent Workspace customers. Customers may experience issues with conversations not loading and chats being disconnected. Full impact is being assessed. We will provide a further update when additional information is known.
09:36 UTC | 02:36 PT
We continue to work on an incident which is resulting in performance issues for Zendesk Chat Customers, and intermittent disconnections for Agent Workspace customers. In case you encounter these, please try refreshing your browser.
10:25 UTC | 03:25 PT
We are happy to report that our team resolved the latency and Chat disconnection issues impacting our customers. We apologise for the inconvenience and thank you for your patience.
Root Cause Analysis
We were performing scheduled maintenance of our Chat GQL servers. This incident was caused by a session caching issue with the Chat websocket gateway connecting to a previously removed Chat GQL instance. This caused ongoing traffic requests hitting unreachable IPs resulting in timeout errors.
To fix this issue, we flushed the cache of the websocket gateway, and recovery was observed thereafter.
- Implement purging of IP addresses once a session has ended on the websocket server [Scheduled]
FOR MORE INFORMATION
For current system status information about your Zendesk, check out our system status page. During an incident, you can also receive status updates by following @ZendeskOps on Twitter. The summary of our post-mortem investigation is usually posted here a few days after the incident has ended. If you have additional questions about this incident, please log a ticket with us.