On June 24, 2019, from 09:00 UTC to 12:03 UTC, Zendesk Talk and Chat customers experienced degraded performance. Talk customers were unable to answer incoming calls. Chat Agents and Visitors failed to perform DNS resolution of Chat mediators or load our cached CDN assets. Therefore, the Chat widget and agent dashboard would have failed to load. For logged in Chat agents, everything would work, except that they'd see fewer than usual visitor numbers.
12:18 UTC | 05:18 PT
We are investigating reports that some of our Zendesk Talk customers are unable to accept calls. We will provide an update shortly.
12:58 UTC | 05:58 PT
We are happy to report that the CDN issues impacting multiple Zendesk products are now resolved. If you continue to experience issues with Zendesk Talk, please logout and log back in to refresh your session.
Root cause Analysis
This incident was caused by a large US network provider improperly advertising routes (BGP route leak) to our 3rd party CDN provider that serves Zendesk site assets to local customers. During the investigation, our team identified the issue was occurring at the CDN level and logged this issue with our third-party CDN provider. Our provider confirmed the issue was caused by a large network provider in the US improperly advertising routes. The improper advertisement caused a major Internet outage impacting multiple network providers including our own 3rd party CDN.
Once the network provider corrected the issue, service to our CDN provider was restored and Zendesk Talk and Chat service was also restored.
To prevent this issue from occurring in the future, our CDN provider has strongly recommended that the network provider configure hard BGP rate limits, implement IRR-based filtering and the RPKI framework.
FOR MORE INFORMATION
For current system status information about your Zendesk, check out our system status page. During an incident, you can also receive status updates by following @ZendeskOps on Twitter. The summary of our post-mortem investigation is usually posted here a few days after the incident has ended. If you have additional questions about this incident, please log a ticket with us.