SUMMARY
On January 25, 2022 (from 18:56 to 19:28 UTC) and January 26, 2022 (from 04:11 to 09:15 UTC), Zendesk Chat customers using Agent Workspace on Pod 23 (January 25) and Pod 17 (January 26) were unable to receive and accept incoming chats while agents were online and available.
Timeline
21:16 UTC | 13:16 PT
We are investigating reports of Chats not routing in Agent Workspace on Pod 23. More information to follow.
21:42 UTC | 13:42 PT
From 18:56 to 19:18 UTC, we experienced an issue that caused Agent Workspace chats to not be routed on Pod 23. The issue is now resolved and service restored.
09:47 UTC | 01:47 PT
While customers on Pod 23 will no longer see an issue with Chat routing, the same issue has arisen to customers on Pod 17 today (Jan 26).
Our engineers continue their efforts to address this across all of our Pods to avoid further disruption.
Because of this work, during the period between 5:05 UTC ~ 7:15 UTC, and after 9:06 UTC some of our customers on Pod17 using Agent Workspace may have experienced chats that will appear as stuck and that will only have tickets created upon the chat ending or visitor disconnection. Our engineers continue their work on Pod 17 and we will provide further updates once available.
11:00 UTC | 03:00 PT
Thank you for your patience, our engineering team have now confirmed that the Chat routing issue impacting our Agent Workspace customers on POD 17 has been resolved at 9:15 UTC, and no new chats should have become stuck.
Root Cause Analysis
This incident was caused by Zendesk Chat middleware unexpectedly setting a session cookie on an error response. The client-side stored this cookie and applied it to subsequent requests, leading to the behavior customers observed during the incident.
Resolution
To fix this issue, we restarted the affected service to clear the cookie store and recovery was observed shortly after.
Remediation Items
- Temporarily disabled use of client-side cookie store in HTTP client library [Completed]
- Fix middleware authentication behaviour [In Progress]
- Retune the existing monitors and alerting for the affected service [In Progress]
FOR MORE INFORMATION
For current system status information about your Zendesk, check out our system status page. The summary of our post-mortem investigation is usually posted here a few days after the incident has ended. If you have additional questions about this incident, please log a ticket with us via ZBot Messaging within the Widget.