SUMMARY
On September 27, 2021 from 14:15 UTC to 16:48 UTC, some Chat customers may have experienced some delay in chat history, real-time monitoring and push notifications updates.
Timeline
16:48 UTC | 09:48 PT
We have resolved the issue impacting Chat history, notifications and real time monitoring. Thank you for your patience and understanding.
16:00 UTC | 09:00 PT
We are currently investigating an issue in our Chat product. Symptoms include delayed Chat histories, real time monitoring and push notifications.
POST-MORTEM
Root Cause Analysis
This incident was caused by a drop in connections which recovered on its own. Some query latencies on the framework software bus using stream-processing services were so high that they caused some other services to stop consuming new events and generated a build-up in the lag on the consumers group.
Resolution
The service auto-recovered without any manual intervention.
Remediation Items
- Add a configuration to discard old events for faster recovery.
- Create additional, more robust monitoring for sudden drops in incoming traffic.
FOR MORE INFORMATION
For current system status information about your Zendesk, check out our system status page. During an incident, you can also receive status updates by following @ZendeskOps on Twitter. The summary of our post-mortem investigation is usually posted here a few days after the incident has ended. If you have additional questions about this incident, please log a ticket with us.
1 Comments
Post-mortem published October 21, 2021.
Article is closed for comments.