SUMMARY
On November 27, 2020 from 13:24 UTC to 18:26 UTC, Zendesk Chat customers experienced issues with delayed Chat Monitor metrics (“Wait Time” & “Response Time” real-time metrics).
TIMELINE
17:27 UTC | 09:27 PT
We have temporally disabled the "Wait Time" & "Response Time" real-time metric in Zendesk Chat due to high traffic volumes. We will provide an update once the metrics are restored.
18:33 UTC | 10:33 PT
We have restored access to “Wait Time” & “Response Time” real-time metrics in Zendesk Chat through resource re-allocation. We will continue to monitor the situation to ensure stability
POST-MORTEM
Root Cause Analysis
This incident was caused by insufficient capacity in a Chat real-time messaging node. Elevated Black Friday traffic caused real-time Chat data to bank up resulting in lags updating vital Chat Monitor metrics.
Resolution
To fix this issue, our team suspended processing of some less vital metrics to ease the lag and to restore Chat wait time metrics in Chat Monitor. At the same time, our operations team increased node capacity to fully resolve the issue.
Remediation Items
- Increase Chat Real-Time Monitoring capacity in Production environment [Scheduled].
FOR MORE INFORMATION
For current system status information about your Zendesk, check out our system status page. During an incident, you can also receive status updates by following @ZendeskOps on Twitter. The summary of our post-mortem investigation is usually posted here a few days after the incident has ended. If you have additional questions about this incident, please log a ticket with us.