On May 11, 2020 from 22:00 to 22:25 UTC, our CDN provider experienced a regional outage. This affected a subset of Zendesk customer traffic routing through a Dallas-Fort Worth, Texas data center, impacting all Zendesk products excluding Sunshine Conversations and Sell.
14:35 UTC | 07:35 PT
After investigating, we have determined there was a brief regional outage impacting our downstream networking provider on May 11 between 22:00 and 22:25 UTC.
22:54 UTC | 15:54 PT
We experienced a brief period of connectivity issues impacting some of our customers. We are currently monitoring the situation.
Root Cause Analysis
A core network device hardware failure in the provider’s data center caused all HTTP requests passing through that data center to fail. This represented less than 1% of total Zendesk traffic, but users in that geographical area may have seen all their requests fail during the incident.
Zendesk monitoring systems triggered as soon as the impact began and we notified our cloud vendor who was already responding to the outage. By 22:25 UTC all traffic was rerouted to bypass this data center and customer impact was resolved.
While Zendesk backends remained operational during the outage, the impact for each Zendesk customer depended on the location of agents or end users attempting to access that Zendesk instance. Email systems were not affected during this time.
1. Improve CDN regional monitoring and alerts
2. Follow up with provider on reducing time to reroute traffic
FOR MORE INFORMATION
For current system status information about your Zendesk, check out our system status page. During an incident, you can also receive status updates by following @ZendeskOps on Twitter. The summary of our post-mortem investigation is usually posted here a few days after the incident has ended. If you have additional questions about this incident, please log a ticket with us.