SUMMARY
When our CDN provider’s Gothenburg data center becomes overloaded due to traffic failing over from Copenhagen, an automated system sends traffic to specific locations. During the time of impact, the Gothenburg data center was not performing this correctly, causing routing-loops for all prefixes that were failed over. End users of customers who were on the affected prefixes would've been unable to connect to those websites in Gothenburg as long as the failover was active. Rebooting the servers at the affected data center remedied the broken forwarding state.
14:36 UTC | 07:36 PT
Our Internet security provider identified an issue with network performance on their Gothenburg, Sweden datacenter they are investigating. In the meantime we have received confirmations from our customers that the issue is now resolved
13:48 UTC | 06:48 PT
We have received reports of some customers in Northern Europe not being able to access Zendesk. We are currently looking in to this issue.
POST-MORTEM
Root Cause
Our CDN provider’s edge servers use GRE encapsulation to send traffic to other edge data centers to redistribute traffic in a more predictive fashion. It appears that GRE is an unstable protocol on some data center switches. Instead of encapsulating requests and forwarding them to a different data center, the packets were sent straight out without encapsulation, causing a loop in the network.
Remediations
- Our CDN provider’s engineers are actively working to implement additional monitoring to reach prefixes that implement failover in this manner.
- Our CDN provider’s engineers have escalated this to our router vendor
- Our CDN provider’s engineers have disabled this type of failover for now.
FOR MORE INFORMATION
For current system status information about your Zendesk, check out our system status page. During an incident, you can also receive status updates by following @ZendeskOps on Twitter. The summary of our post-mortem investigation is usually posted here a few days after the incident has ended. If you have additional questions about this incident, please log a ticket with us.
0 Comments
Article is closed for comments.