On the 30th of May 2020, between 10:48 UTC to 16:21 UTC a small subset of our customers received “Server Error” messages when trying to access Insights. In addition, between 10:48 UTC to 22:22 UTC a small subset of Chat customers encountered server errors (response code 500) when using the product.
16:37 UTC | 09:37 PT
The issue of a “Server Error” message encountered when trying to access Insights has been resolved. Thank you for your patience.
15:54 UTC | 08:54 PT
We are continuing our implementation of a fix for the “Server Error” message encountered when trying to access Insights and will provide a status update in roughly 1 hour.
15:13 UTC | 08:13 PT
We are implementing a fix for the "Server Error" message encountered when trying to access to Insights.
14:26 UTC | 07:26 PT
We continue to investigate the “Server Error” message when trying to access Insights. We will send another update as soon as possible
13:39 UTC | 06:39 PT
We are investigating reports of our customers receiving “Server Error” messages when accessing Insights
Root Cause Analysis
Zendesk uses SSL certificates to secure all our public facing services as well as internally between services. On May 30, 2020 the AddTrust External CA Root certificate that was used to sign our SSL certificates expired. Root certificates are a cornerstone of trust online and typically have a long lifetime. 20 years in the case of the AddTrust certificate. That 20 year period expired on May 30, 2020. This caused valid certificates used throughout Zendesk and many other web properties across the world to be detected as invalid.
Customers with modern web browsers, operating systems and other client software would have been largely unaffected by this behaviour because those systems have been engineered to accommodate alternative chains of trust and therefore having the ability to deal with expired certificates in the chain.
Nevertheless, some legacy client systems such as older SSL clients or web browsers, may have experienced a loss of service due to their old store of root certificates and inability to deal with expired root certificates.
In addition to the root cause above, an influencing factor was that even though our internal processes and systems do monitor and alert us to upcoming SSL expiry issues we did not sufficiently monitor root certificate expirations. Our SSL vendor also did not expect the impact experienced in this incident leading to a narrower proactive notification strategy (see Remediation section for further details).
Further information on the scope of this issue can be found here.
As soon as our team identified the root certificate issue we immediately obtained a new set of SSL certificates for all our affected services and applied them across the Zendesk product suite. This led to full resolution of the issues in Insights and Chat. At the same time we initiated a review of all our secured services internally to identify any remaining outliers.
- Investigate and implement improved internal processes in how to better monitor the expiry of CA root certificates in our systems [In Progress].
- Our SSL provider has committed to better proactive notification of expiring certificates to ensure we have wider awareness of any certificate changes [Completed].
- Initiate a process to consolidate the SSL certificate management across Zendesk to further improve the focus so we can achieve a faster resolution and continue to improve the reliability of our systems for our customers [In Progress].
- Identify and update internal systems that could have been affected by this issue [Completed].
FOR MORE INFORMATION
For current system status information about your Zendesk, check out our system status page. During an incident, you can also receive status updates by following @ZendeskOps on Twitter. The summary of our post-mortem investigation is usually posted here a few days after the incident has ended. If you have additional questions about this incident, please log a ticket with us.