SUMMARY
On August 19, 2021 from 14:52 UTC to 16:52 UTC customers across multiple Pods using Explore, experienced problems with dashboards getting timed out and data not loading.
Timeline
16:18 UTC | 09:18 PT
We are investigating reports of dashboard loading errors in Explore across multiple pods. We will provide additional information as soon as it's available.
16:52 UTC | 09:52 PT
The issues affecting Explore dashboard loading are now resolved. Thank you for your patience during our investigation.
POST-MORTEM
Root Cause Analysis
This incident was caused by a lock placed on an internal shared table caused by an internal background process, which runs on our processing transactional and analytics databases. When the lock was placed on the catalog table that stores statistical data, the reader could not acquire a session, and all subsequent queries failed.
Resolution
To fix this issue, the team had to wait for the internal background process to finish so it removed the exclusive lock from the catalog system table and got the queries to stop failing, leading to full data load to Explore.
Remediation Items
- Decrease the minimum delay between the process that runs and analyses the tables and reclaims unused space for reuse runs on any given database.
- Disable the attempt to truncate empty pages at the end of the system catalog table that stores statistical data about the contents of the database.
- Work with alerts to monitor locked sessions.
FOR MORE INFORMATION
For current system status information about your Zendesk, check out our system status page. During an incident, you can also receive status updates by following @ZendeskOps on Twitter. The summary of our post-mortem investigation is usually posted here a few days after the incident has ended. If you have additional questions about this incident, please log a ticket with us.
1 Comments
Post-Mortem published September 3, 2021.
Article is closed for comments.