Service Incident - February 15th, 2022 - Support | Multiple Pods - CDN Latency issues

SUMMARY

On February 15, 2022 from 19:08 UTC to 23:00 UTC, Support customers whose traffic navigates through the US West Coast infrastructure of our CDN provider experienced server error messages and performance issues when trying to load the Support interface.

Timeline

20:11 UTC | 12:11 PT

We have reports of errors and issues loading the Support interface. We are investigating and will provide further updates as we receive additional information.

20:37 UTC | 12:37 PT

We have seen some signs of improvement but our team continues to investigate errors and issues loading the Support interface. We will provide additional updates as we learn more.

21:03 UTC | 13:03 PT

We are seeing recovery, but our team is still monitoring the issue while we investigate the root cause. If you are still experiencing issues, please let us know how your agents are affected.

21:57 UTC | 13:57 PT

We have new reports of server errors and performance degradation with the Support interface for customers whose traffic navigates through the west coast (US). We are actively working with our CDN provider to mitigate the issue.

23:02 UTC | 15:02 PT

We are seeing recovery for access to and performance in the Support interface. If you are still experiencing issues, please let us know.

01:24 UTC | 17:24 PT

Our service provider has resolved the issue that resulted in errors in Support on multiple Pods. If you still experience issues, please perform a hard refresh of your browser then try again. Thanks so much for your patience.

POST-MORTEM

Root Cause Analysis

This incident was caused by an issue on our partners end due to a return path latency coming from one origin region on the East Coast of the United States.
This was then further compounded by a maintenance they were running at the time, which caused even more traffic to be funneled.

Resolution

This issue was resolved by our CDN provider, remediating the issue on their end.

Remediation Items

While this issue was resolved by our partner our own teams have committed to the below points to fully prevent, or help speed up the resolution of any potential recurrence of this in the future.

Review and update our internal documentation on GeoDNS details.
Create new allerting for these types of errors.

FOR MORE INFORMATION

For current system status information about your Zendesk, check out our system status page. During an incident, you can also receive status updates by following @ZendeskOps on Twitter. The summary of our post-mortem investigation is usually posted here a few days after the incident has ended. If you have additional questions about this incident, please log a ticket with us.