SUMMARY
On February 8, 2024 from 14:30 UTC to 15:41 UTC, All Agent Workspace (AW) Chat and Messaging customers on Pod 27 with the Chat routing feature enabled experienced issues with chats not being assigned, and where their agents were placed into idle timeout.
Timeline
15:37 UTC | 07:37 PT
We are currently investigating reports of customer on Pod 27 experiencing issues with chat assignments and investigation is underway. We will provide another update shortly.
15:54 UTC | 07:54 PT
Our team has deployed a potential fix for the issue with chat assignments. Please refresh your browser and try again. We will continue to work on this until full resolution and will provide another update in 30 mins.
16:26 UTC | 08:26 PT
We are seeing some improvement following the recently deployed fix for the issues with chat assignments; however, our team continues their work to ensure that we have recovered fully. Please let us know if you continue to experience any issues.
16:45 UTC | 08:45 PT
The fix applied to address the issues with chat assignment is proving effective, and as a result this incident is now resolved. Thank you for your patience during our investigation.
POST-MORTEM
Root Cause Analysis
This incident was caused by data message broker services going down on Pod 27.
Resolution
To fix this issue, a redeployment of the services was undertaken, and the message latency recovery was observed in the logs. We had customer confirmation of recovery thereafter.
Remediation Items
- Update paging policies to page the team for quicker incident response [Done]
- Fine tune backend monitoring and alerts [Done]
FOR MORE INFORMATION
For current system status information about your Zendesk, check out our system status page. The summary of our post-mortem investigation is usually posted here a few days after the incident has ended. If you have additional questions about this incident, please log a ticket with us via ZBot Messaging within the Widget.