SUMMARY
On June 07, 2022 from 08:10 UTC to 11:18 UTC, customers on Pod 17 using the Zendesk Support product experienced delays in sending and receiving side conversations.
Timeline
10:07 UTC | 03:07 PT
We are currently working a fix for reported issues with Side Conversations in Pod 17 regarding delays and deliverability. Please bear with us as we continue working on this.
10:36 UTC | 03:36 PT
We’re happy to confirm that the fix has been applied successfully and we already see a decrease in errors related to Side Conversations issues affecting Pod 17 customers. Thank you for your patience while we monitor this till resolution.
11:33 UTC | 04:33 PT
We’d like to confirm the full resolution of the issue for customers in Pod 17 causing delays in Side Conversations as well as integrations issues for this service. There should be no loss of data because of this [1/2]
11:34 UTC | 04:34 PT
and all messages have been processed now. We appreciate your patience. [2/2]
POST-MORTEM
Root Cause Analysis
This incident was caused by a misconfiguration of a backend service, resulting in an accumulation of garbage data, causing a gridlock in the job system that saw delays and delivery issues for side conversations.
Resolution
To fix this issue, we stopped jobs from continuing to enqueue, cleared the garbage data at the backend from the failures, and raised the memory allocation to the queue service to address the backlog. Once the queue backlog was cleared, normal operations resumed, and recovery was observed at that point.
Remediation Items
- Fix erroneous configuration [Completed]
FOR MORE INFORMATION
For current system status information about your Zendesk, check out our system status page. The summary of our post-mortem investigation is usually posted here a few days after the incident has ended. If you have additional questions about this incident, please log a ticket with us via ZBot Messaging within the Widget.