SUMMARY
On September 22, 2021 from 02:35 UTC to 05:55 UTC , a group of our Explore customers experienced timeouts while trying to load their Explore dashboards.
TIMELINE
07:15 UTC | 00:15 PT
Between 02:39 and 05:50 UTC, customers using Zendesk Explore may have experienced issues obtaining data on their dashboards. Once the root cause was addressed, recovery was observed. Apologies for the inconvenience caused.
POST-MORTEM
Root Cause Analysis
The incident was caused by a software defect in the software provided by our cloud computing partner impacting our services for Explore responsible for querying our customers data.
Following their investigations, our cloud partner has acknowledged that a defective version of the software was preventing memory to be released in the timely manner when the Explore infrastructure is under pressure and they provided us with an updated version that is containing this fix.
Resolution
To fix this issue, Zendesk has provisioned some additional capacity to reduce the load on the infrastructure while our partner was able to identify the root cause of the issue. Once these actions have been completed, our services returned to normal.
Our partner has updated their software to improve the memory management when resource usage is high.
Remediation Items
- Our partner Completed an upgrade to their software responsible for memory caching.
- Zendesk to complete the rollout of the updated software [Planned Q4]
FOR MORE INFORMATION
For current system status information about your Zendesk, check out our system status page. During an incident, you can also receive status updates by following @ZendeskOps on Twitter. The summary of our post-mortem investigation is usually posted here a few days after the incident has ended. If you have additional questions about this incident, please log a ticket with us.