Summary
On October 28, 2024 from 17:37 UTC to 19:19 UTC, a small subset of Zendesk Explore customers on Pods 13, 15, 19, 20, 23, 25, 26 and 27 experienced various errors in the product when performing tasks such as generating reports and loading dashboards.
Timeline
October 28, 2024 06:51 PM UTC | October 28, 2024 11:51 AM PT
We are receiving reports of issues loading reports and dashboards in Explore across multiple pods and our team is investigating. More updates to follow shortly.
October 28, 2024 07:02 PM UTC | October 28, 2024 12:02 PM PT
We have confirmed an issue affecting Explore customers causing 502 errors and latency when attempting to load default and custom dashboards and reports. Our team is investigating and we will provide further updates within the next 30 minutes.
October 28, 2024 07:21 PM UTC | October 28, 2024 12:21 PM PT
We have rolled back a recent update and are beginning to see improvement in the issue affecting Explore customers, causing 502 errors and latency when attempting to load dashboards and reports. We will continue to monitor until full recovery. Please let us know if you continue to experience any issues.
October 28, 2024 07:38 PM UTC | October 28, 2024 12:38 PM PT
We are happy to report that the issue affecting Explore dashboard and report loading has been resolved. Thank you for your patience during our investigation.
Root Cause Analysis
This incident was caused by a network configuration error that resulted in connectivity timeouts between network infrastructure components. This led to customers receiving HTTP request errors in the Explore product.
Resolution
To fix this issue, our team rolled back the network configuration changes in the affected nodes which resulted in the service being restored.
Remediation Items
The following work items have been scheduled:
- Update Explore startup / readiness probes to prevent rollouts in some scenarios.
- Investigate automating some elements of the affected network configuration paths
- Review monitoring and alerting
- Update infrastructure change runbooks.
FOR MORE INFORMATION
For current system status information about your Zendesk, check out our system status page. The summary of our post-mortem investigation is usually posted here a few days after the incident has ended. If you have additional questions about this incident, contact Zendesk customer support.
1 comment
Dan Beirouty
Post-mortem published November 14, 2024.
0