SUMMARY
On July 02, 2024 from 01:44 UTC to 16:30 UTC, customers on multiple pods experienced a ‘Couldn’t connect to Server’ error when trying to open tickets while using the Chat and Messaging feature in Agent Workspace.
Timeline
July 02, 2024 01:09 PM UTC | July 02, 2024 06:09 AM PT
We are investigating different issues with Chat and Messaging in Agent Workspace across Pods 17, 18, 28 and 29 for a subset of customers where chats are not being offered to agents and/or agents cannot join, and messages are sitting in unassigned. More information to come.
July 02, 2024 02:07 PM UTC | July 02, 2024 07:07 AM PT
We are going through different Chat/Messaging/AgentWorkspace issues at the moment and continue investigating all problems. We appreciate your patience.
July 02, 2024 02:46 PM UTC | July 02, 2024 07:46 AM PT
We continue to address issues affecting Agent Workspace customers across multiple Pods. The main issue involves “Couldn’t connect to the server” errors, which impact products connected to Omnichannel interface tickets. We are exploring fixes and testing options to fully resolve this issue.
July 02, 2024 03:23 PM UTC | July 02, 2024 08:23 AM PT
We are still investigating potential fixes for the issue affecting Agent Workspace customers across multiple pods, primarily causing a "Couldn't connect to the server" error to present in Omnichannel interface tickets. We will post further updates in one hour or when we have new information to share.
July 02, 2024 04:16 PM UTC | July 02, 2024 09:16 AM PT
We are rolling out a potential fix for the issue affecting Agent Workspace customers across multiple pods, primarily causing the error "Couldn't connect to the server" in Omnichannel tickets. Some customers may be seeing improvement already as the fix rolls out, and we will provide additional information within the next hour or when the fix is rolled out to all pods.
July 02, 2024 04:46 PM UTC | July 02, 2024 09:46 AM PT
We are happy to report that we have resolved the issue affecting Agent Workspace customers across all pods, causing server connection errors in Omnichannel tickets. Thank you for your patience during our investigation.
POST-MORTEM
Root Cause Analysis
A routine backend upgrade was being performed that inadvertently caused performance degradation in managing connections and subscriptions, especially during high traffic periods.
Resolution
A software fix was subsequently identified and rolled out gradually. Once stability was confirmed, roll out to all pods was done. With further monitoring, recovery was confirmed.
Remediation Items
- Optimise the software code to improve performance with the upgraded system [Scheduled]
- Improve system testing procedures to improve robustness of future upgrades [Scheduled]
FOR MORE INFORMATION
For current system status information about your Zendesk, check out our system status page. The summary of our post-mortem investigation is usually posted here a few days after the incident has ended. If you have additional questions about this incident, contact Zendesk customer support.