Service Incident - December 7th, 2021 - Sell, Explore and Sunshine Conversations issues (U.S. region)

SUMMARY

On December 7, 2021 from 15:02 UTC to 03:30 UTC December 8, 2021, due to an outage experienced by our cloud services partner on part of their infrastructure, customers using some of our products experienced issues.

Explore customers, who’s Explore account is hosted on our US based infrastructure were unable to create queries, received error messages, or were unable to see up to date data in their accounts.
Some of our Sell customers were unable to send emails through the product, or experienced overall slow performance in the product.
A group of our SunCo customers, while being able to receive, encountered issues with sending messages via Whatsapp.
Customers using the AWS Connector, encountered delays in sending messages to AWS EventBridge.

Timeline

16:58 UTC | 08:58 PT

We are aware of an issue causing Explore data to not be refreshed that began at 16:30 UTC. Real time dashboards are not impacted. We will provide an update when we have more information.

18:38 UTC | 10:38 PT

We are continuing to investigate Explore issues with our hosting provider. In addition to data refresh delays, Support data on real time dashboards may be inaccurate. Other customers may be unable to see any data in Explore dashboards.

19:16 UTC | 11:16 PT

We are continuing to investigate Explore issues as well as investigating issues with Sell degradation including inability to send communications and performance issues.

19:56 UTC | 11:56 PT

In addition to Explore and Sell issues, some customers may experience issues sending and receiving WhatsApp messages. We are continuing to investigate these issues with our hosting provider.

21:15 UTC | 13:15 PT

We are continuing to work with our platform partner to resolve the Explore, Sell, and WhatsApp degradation. Next update when we have more information to share.

23:35 UTC | 15:35 PT

The issue impacting the sending and receiving of WhatsApp messages is now fully resolved. We continue to work with our platform partner on the Sell and Explore degradation.

01:38 UTC | 17:38 PT

We are observing recovery for Sell but Explore remains degraded for some customers, in addition to the full recovery of WhatsApp messaging. We continue to work with our platform partner towards complete recovery and we'll provide another update when more info comes to hand.

03:08 UTC | 19:08 PT

The issues with Explore are subsiding as we process the data backlog. All other Zendesk products are stable. A final update will be provided once Explore fully recovers. Thanks for your patience today.

10:32 UTC | 02:32 PT

We are pleased to report that the issues with Explore, Sell and WhatsApp messaging have been fully resolved. We greatly appreciate your patience during today's issues and apologize for the inconvenience caused.

POST-MORTEM

Root Cause Analysis

This incident was caused by an outage in part of the US based infrastructure of our cloud services partner. This in turn caused our processes running from the impacted servers to fail, or be delayed.

Resolution

While our engineering team did their best to minimise the impact of this outage to our customers, this incident was resolved by our partner.

Remediation Items

Investigate potential Explore infrastructure changes to increase resilience.
Investigate potential Sell infrastructure changes to increase resilience.
Improve recovery behaviour and time for SunCo.

FOR MORE INFORMATION

For current system status information about your Zendesk, check out our system status page. During an incident, you can also receive status updates by following @ZendeskOps on Twitter. The summary of our post-mortem investigation is usually posted here a few days after the incident has ended. If you have additional questions about this incident, please log a ticket with us.