SUMMARY
Between 23:30 UTC on November 12, 2024, and 11:26 UTC on November 15, 2024, Support customers using SLAs in Pods 25 and 30 experienced delayed SLA calculations, and the SLA badges on their tickets were not appearing as expected after applicable ticket updates.
TIMELINE
November 15, 2024 01:00 PM UTC | November 15, 2024 05:00 AM PT
We are pleased to report that the issues impacting Metrics SLA performance on Pod 25 and 30 have now been resolved. Thank you for your patience.
November 15, 2024 12:16 PM UTC | November 15, 2024 04:16 AM PT
We are now seeing improvements to the issue impacting Metrics SLA performance on Pod 25 and 30. We continue to monitor and will provide further updates as soon as we have them.
POST-MORTEM
Root Cause Analysis
This incident was caused by a misconfigured secret for the metric event service. This meant that when Zendesk deployed an update with additional validation, the service failed to initialize for Asia-Pacific deployments, leading to processing delays.
Resolution
To fix this issue, a "default" value was added for the affected secret on November 15, 2024. This allowed the metric event service to initialize properly and resume normal operations. Zendesk also identified and set a default value for a secret of the talk transcription service to mitigate any future risks.
Remediation Items
- Conduct a thorough audit of all secrets to ensure that values are set for all localities, especially in Asia-Pacific regions.
- Improve existing implementation tools to prevent similar misconfigurations in the future.
- Create additional alerts to notify relevant teams of initialization failures and issues.
- Investigate the tracking of failure metrics to ensure that such incidents trigger alerts for timely resolutions.
By implementing these remediations, we aim to enhance the resilience of our services and prevent similar incidents in the future.
FOR MORE INFORMATION
For current system status information about your Zendesk, check out our system status page. The summary of our post-mortem investigation is usually posted here a few days after the incident has ended. If you have additional questions about this incident, contact Zendesk customer support.
1 comment
Bob Novak
Post-mortem published Dec 4, 2024
0