SUMMARY
On October 31, 2022 from 17:22 to 19:02 UTC, Support customers experienced issues with agent collision, and Talk customers experienced issues across several talk functions (hold, transfers, wrap-up mode).
Timeline
17:59 UTC | 10:59 PT
We are investigating reports of issues with routing in Talk and agent collision within tickets in Support. Further updates will be posted shortly.
18:16 UTC | 11:16 PT
We have confirmed an issue affecting call routing and agent transfers in Talk and agent collision in tickets in Support. Our engineers are investigating and we will post additional information as soon as we can.
18:45 UTC | 11:45 PT
Our team continues to investigate an issue affecting agent transfers and ticket creation in Talk, as well as agent collision in tickets in Support. We will post additional updates as we learn more.
19:09 UTC | 12:09 PT
We are beginning to see some improvement in the issues affecting Talk and agent collision within Support. We will monitor the situation until full resolution, and please let us know if you continue to experience any issues.
20:01 UTC | 13:01 PT
We are happy to report that the issues affecting several Talk functions and agent collision in Support are now resolved. Thank you for your patience during our investigation.
POST-MORTEM
Root Cause Analysis
This incident was caused by a configuration error resulting in an internal service to be unreachable. This impacted the agent presence functionality, resulting in the issues experienced by our customers.
Resolution
To fix this issue, we reverted to the previously deployed version, and were able to confirm the start of recovery. We continued to monitor until full recovery was observed.
Remediation Items
- Improve logging and monitoring to more quickly and accurately identify the source of similar issues.
- Add additional testing scenarios and improve alerting for errors.
- Investigate whether smoke tests in staging can be improved to catch these issues before they are released.
FOR MORE INFORMATION
For current system status information about your Zendesk, check out our system status page. The summary of our post-mortem investigation is usually posted here a few days after the incident has ended. If you have additional questions about this incident, please log a ticket with us via ZBot Messaging within the Widget.
1 Comments
Post-mortem published November 14, 2022
Article is closed for comments.