Service Incident - December 22nd, 2021 - Pods 19 and 23 - Performance Issues

SUMMARY

On December 22, 2021 from 12:09 to 13:26 UTC, Support customers on Pods 19 and 23 and Talk customers on all Pods may have experienced service degradation and connection issues.

Timeline

18:54 UTC | 10:55 PT

We are happy to report that we have fully recovered from the impact to Pods 19 and 23.

15:28 UTC | 07:28 PT

We no longer see any impact to Pods 19 and 23, but continue to work with our cloud services partner until we can confirm full resolution. We will post final update once this happens.

14:31 UTC | 06:31 PT

We continue to monitor recovery on our services on POD 19 and 23. We will provide another update in an hour.

13:51 UTC | 05:51 PT

We are seeing recovery on Pods 19 and 23. Our teams continue to monitor.

13:14 UTC | 05:14 PT

Please note, this incident may impact all customers on Pods 19 and 23, as well as Talk customers hosted on other Pods.

13:04 UTC | 05:04 PT

Our team continues to investigate the issue causing connection errors in Talk for some of our customers as well as email delays and search issues on Pods 19 and 23.

12:42 UTC | 04:42 PT

We are investigating performance issues impacting Pods 19 and 23, more details to follow.

POST-MORTEM

Root Cause Analysis

This incident was caused by an outage in part of the US based infrastructure of our cloud services partner. This in turn caused our processes running from the impacted servers to be delayed.

Resolution

While our engineering team did their best to minimise the impact of this outage to our customers, this incident was resolved by our partner.

Remediation Items

Add monitoring and alerting to improve response time.
Explore options for redirecting traffic in similar scenarios in the future.

FOR MORE INFORMATION

For current system status information about your Zendesk, check out our system status page. During an incident, you can also receive status updates by following @ZendeskOps on Twitter. The summary of our post-mortem investigation is usually posted here a few days after the incident has ended. If you have additional questions about this incident, please log a ticket with us.