SUMMARY
On May 24, 2023 from 16:29 UTC to 22:23 UTC, some Zendesk Support customers on Pod 20 experienced delays or failures with triggers and webhooks firing in an untimely manner.
Timeline
18:54 UTC | 11:54 PT
We are investigating reports of latency and delays in Support webhooks firing, and our team is investigating. We will provide further updates soon.
19:07 UTC | 12:07 PT
We have confirmed an issue causing latency and delays in Support webhooks firing, and our team is working to restore regular functionality. We will continue to post updates as we learn more.
19:39 UTC | 12:39 PT
We're still investigating an issue causing latency and delays in Support webhooks firing, and our team is working to restore regular functionality. We will continue to post updates as we learn more.
20:21 UTC | 13:21 PT
Thank you for your patience as we continue to investigate the issue causing latency and delays in webhooks firing. We will provide another update when we have new information to share.
22:31 UTC | 15:31 PT
Hello again! Our engineers have rolled out a fix for the issue causing latency and delays in webhooks firing and we now are seeing recovery. We will continue to monitor performance until fully resolved.
22:48 UTC | 15:48 PT
The issue causing latency and delays in webhooks firing is now fully resolved. Please let us know if you continue to experience issues.
POST-MORTEM
Root Cause Analysis
This incident was primarily caused by a defect that caused a message size limit to be exceeded in our internal event streaming service. A contributing factor in the incident was the failure of a retry mechanism to abort the message; instead, the message was retried and led to exhausted resource capacity in our infrastructure.
Resolution
To fix this issue, our team identified the message size limit error during the incident and deployed a fix to the affected service.
Remediation Items
- Prevent internal service messages from exceeding size limits [Completed]
- Improve error handling for the identified error events [Scheduled]
FOR MORE INFORMATION
For current system status information about your Zendesk, check out our system status page. The summary of our post-mortem investigation is usually posted here a few days after the incident has ended. If you have additional questions about this incident, please log a ticket with us via ZBot Messaging within the Widget.
1 Comments
Post-mortem published May 31, 2023.
Article is closed for comments.