SUMMARY
On July 16, 2021 from 11:46 UTC to 14:07 UTC , Zendesk Sell customers experienced an outage of the platform, receiving the error message “Sorry Something went wrong”, when trying to access the product.
TIMELINE
16:55 UTC | 09:55 PT
Zendesk Sell performance has completely recovered. Our teams have also completed the final backend mitigations surrounding the outage and are happy to declare the overall incident is now resolved. We apologize for the inconvenience.
14:42 UTC | 07:42 PT
Zendesk Sell performance has started to recover. Our teams continue working on mitigating the issues related to this service incident. We will provide further updates as they become available.
14:11 UTC | 07:11 PT
We are seeing improvements on the performance of Zendesk Sell as our engineers continue to work on this issue. We will provide further updates once they become available.
13:43 UTC | 06:43 PT
We apologise for the disruption to your Zendesk Sell Service. Our team continues to work on the resolution of this incident. We will provide an update in 30 minutes.
13:14 UTC | 06:14 PT
Our team continues to work on mitigating the incident impacting Zendesk Sell. We will provide another update in 30 minutes.
12:45 UTC | 05:45 PT
We have identified the root cause of the ongoing issues with Zendesk Sell, and are working on a resolution. We will provide more information as it is available.
12:24 UTC | 05:24 PT
We continue to investigate Sell not loading and customers receiving “Sorry something went wrong” messages. More updates to follow.
12:10 UTC | 05:10 PT
We are investigating an outage that is impacting Zendesk sell. More info shortly.
POST-MORTEM
Root Cause Analysis
This incident was caused by a manual error granting permission levels over the required to a routine customer support task, temporarily removing one of the Sell databases.
This then caused incoming connections to fail and customers to land on a “Sorry, something went wrong” error page when using the Sell web application.
Resolution
To fix this issue, our engineering team restored the database to its previous state, which then brought all web traffic back to normal operations.
Our team continued work and monitoring in the background to make sure the system remained stable and all data being in it’s latest state before the outage.
Remediation Items
- Our team will create a mechanism to prevent accidental execution of unintended commands without specifying a full set of parameters.
- Limit service and console database user rights to prevent any attempt of unintended database changes through code. Perform any such database changes only after additional peer reviews.
-
Review and improve database restoration workflow to shorten time to recover after an incident.
FOR MORE INFORMATION
For current system status information about your Zendesk, check out our system status page. During an incident, you can also receive status updates by following @ZendeskOps on Twitter. The summary of our post-mortem investigation is usually posted here a few days after the incident has ended. If you have additional questions about this incident, please log a ticket with us.