SUMMARY
On September 16, 2024 from 03:55 UTC to 06:00 UTC, some customers located in or around India experienced 499 and 520 response codes on their requests. Users reported various issues with Zendesk, such as outages / login errors and slow loading times, latency, and operational server errors. These issues were caused by a regional outage from our content delivery provider.
Timeline
September 16, 2024 10:37 AM UTC | September 16, 2024 03:37 AM PT
We are pleased to inform you that the recent service disruption has been fully resolved and all affected services are now fully operational. We appreciate your patience and understanding throughout this incident.
September 16, 2024 06:36 AM UTC | September 15, 2024 11:36 PM PT
Our content delivery partner has implemented a fix, and we have reports confirming recovery from some customers. We will continue to monitor and will provide the next update upon full recovery.
September 16, 2024 05:59 AM UTC | September 15, 2024 10:59 PM PT
Whilst our content delivery provider is still working on the fix, we are are no longer able to reproduce previous errors. Please refresh your Zendesk instance and let us know if you are seeing any improvements. We continue to work with our CDN provider and will provide another update in 30 minutes.
September 16, 2024 05:33 AM UTC | September 15, 2024 10:33 PM PT
Our content delivery provider has identified an issue causing elevated errors at their end and they are implementing a fix. We are continuing to work with them and will provide another update in 30 minutes or when we have more to share.
September 16, 2024 05:12 AM UTC | September 15, 2024 10:12 PM PT
We are aware of an issue that may causing symptoms with customers not being able to login, latencies accessing various parts of the Zendesk product, or API calls failing. We are reaching out to our content delivery provider and are working with them. More information in the next 30 minutes.
POST-MORTEM
Root Cause Analysis
This incident was caused when several Route Origin Authorizations (ROA) for our content provider-owned prefixes, issued by the American Registry Internet Numbers (ARIN), expired.
Standard failover mechanisms were not effective due to restrictions put in place on our content-provider IPs in India.
Resolution
To fix this issue, our content provider updated their ROA and the errors subsided.
Remediation Items
In the upcoming weeks, we will be implementing completely new, non-blocked IP ranges into our content provider which will allow for full automated failover, significantly reducing the impact duration of such disruptions.
FOR MORE INFORMATION
For current system status information about your Zendesk, check out our system status page. The summary of our post-mortem investigation is usually posted here a few days after the incident has ended. If you have additional questions about this incident, contact Zendesk customer support.
1 comment
Bob Novak
Postmortem published October 10, 2024
0