SUMMARY
On May 03, 2023 from 15:03 UTC to 20:30 UTC, Sunshine Conversations and Messaging customers on multiple pods experienced delays in sending and receiving messages.
Timeline
15:20 UTC | 8:20 PT
We are investigating reports of high latency affecting messaging through Sunshine Conversations. More information to follow shortly.
15:35 UTC | 08:35 PT
We have confirmed an issue affecting Sunshine Conversations including the ZBot Widget to contact Zendesk Support. We are implementing Web Form to contact our Support team until further notice. Further updates will be provided soon.
15:59 UTC | 08:59 PT
Our team continues to investigate an issue causing long delays in sending and receiving messages through Sunshine Conversations across all pods. We will provide additional updates as we learn more.
16:46 UTC | 09:46 PT
We are still looking into root causes for the issue causing significant delays in sending and receiving messages in Sunshine Conversations across all pods. Additional information will be posted as the investigation progresses.
18:09 UTC | 11:09 PT
Our team continues to investigate the issue causing significant delays in sending and receiving messages in Sunshine Conversations and for a large percentage of Messaging customers. We will provide another update when we have new information to share.
19:03 UTC | 12:03 PT
We have made a change and are seeing improvements in Sunshine Conversations and Messaging latency. We will continue to monitor performance and provide another update once the issue is fully resolved.
20:51 UTC | 13:51 PT
We continue to see improvements with Sunshine Conversations and Messaging but have received reports of delayed ticket creation from Messaging. Our engineers are looking into it. We will provide another update once we have more information.
22:11 UTC | 15:11 PT
The issue causing delayed ticket creation from Messaging is now resolved and our engineers will continue to monitor performance Sunshine Conversations and Messaging latency to ensure continued stability. We will provide another update tomorrow during PST business hours.
20:42 UTC | 13:42 PT
Sunshine Conversations and Messaging latency remains stable and our monitoring shows the fix was effective. This issue is fully resolved. Thank you for your patience during our investigation and monitoring period.
POST-MORTEM
Root Cause Analysis
This incident was caused by the read/write logic of the database to be degraded. There were two notable changes that may have led to this situation:
- Database driver was upgraded.
- There was a recently deployed code change that was running without any issues for several weeks that could have prevented the database from recovering.
Resolution
To fix this issue, the previously deployed code was rolled back, as well as scaled up the main data store. We observed recovery once those steps had been taken.
Remediation Items
- Remove recently introduced multi-document transaction logic [Done]
- Update runbook for scaling resources and rate limits [Scheduled]
- Review disaster recovery playbook for similar scenarios [Scheduled]
FOR MORE INFORMATION
For current system status information about your Zendesk, check out our system status page. The summary of our post-mortem investigation is usually posted here a few days after the incident has ended. If you have additional questions about this incident, please log a ticket with us via ZBot Messaging within the Widget.
1 Comments
Post-mortem published May 16, 2023.
Article is closed for comments.