SUMMARY
On March 10, 2021 from 17:36 UTC to 21:21 UTC, some of our Social Messaging customers using Agent Workspace experienced delayed or failed inbound and outbound messages.
Timeline
20:33 UTC | 12:33 PT
Zendesk is currently experiencing an issue with Social Messaging on Agent Workspace - Our teams are working on a resolution and will continue to update as more information is available.
20:59 UTC | 12:59 PT
We are continuing to investigate an issue with Social Messaging on Agent Workspace that is resulting in delayed messages for customers. We will continue to update as more information becomes available.
21:29 UTC | 13:29 PT
We have rolled back a recent deploy that we believe is the root cause of the delayed messages for our customers on Agent Workspace and Social Messaging. We are seeing improvement in stability and will continue to monitor the situation.
POST-MORTEM
Root Cause Analysis
This incident was caused by a recent deploy for Sunshine Conversations. The new deploy contained a race condition in one of its new features resulting in mishandled exception errors killing an application that handles part of the messaging process.
Resolution
To fix this issue, our developers released a hot fix amending the behaviour and performed a restart of the service, after which backlogged messages were sent out and functionality returned to normal.
Remediation Items
- We will reexamine our code base for any potentially similar patterns
- We will raise the rate limitations for our internal services
- Created additional monitoring for Social Communications.
FOR MORE INFORMATION
For current system status information about your Zendesk, check out our system status page. During an incident, you can also receive status updates by following @ZendeskOps on Twitter. The summary of our post-mortem investigation is usually posted here a few days after the incident has ended. If you have additional questions about this incident, please log a ticket with us.