Summary
On January 27, 2022 from 08:34 UTC to 10:15 UTC, Zendesk Chat and Messaging customers may have experienced failed visitor registrations, widget loading failures, delays in incoming messaging conversations and errors with sending messages.
Timeline
10:00 UTC | 02:00 PT
We are currently investigating reports of issues with incoming messages for Chat and Messaging, registration failing and Widget not appearing for some customers. Your patience is appreciated.
10:22 UTC | 02:22 PT
Our engineers have identified and addressed the root cause of the issues with incoming messages for Chat and Messaging, visitor registration failing and Widget not appearing for some customers. We are seeing improvements and continue to monitor performance.
11:14 UTC | 03:14 PT
We continue to monitor Chat & Messaging performance and are seeing improvements on incoming messages. However, some agents might see delays in Chat History update. These should be cleared within the next few hours.
12:14 UTC | 04:14 PT
We are happy to report that the issues impacting Chat and Messaging have now been fully resolved. Thank you for your patience!
Root Cause Analysis
This incident was caused by a code change that caused our Chat Visitor API to be migrated to an incorrect datastore. This led to the incorrect datastore running out of capacity resulting in the symptoms experienced by our customers during this incident.
Resolution
To fix this issue, our team reverted the code change and migrated the affected visitors to the regular datastore.
Remediation Items
- Additional monitoring and alerts to assist in responding quicker to similar events in future [Scheduled]
- Refactor impacted service code to ensure onboarding nomenclature is clearer to enable better testing for each type of onboarding (visitor vs account level) [Scheduled]
- Add new negative tests to datastore migration test suite [Scheduled]
FOR MORE INFORMATION
For current system status information about your Zendesk, check out our system status page. During an incident, you can also receive status updates by following @ZendeskOps on Twitter. The summary of our post-mortem investigation is usually posted here a few days after the incident has ended. If you have additional questions about this incident, please log a ticket with us.