SUMMARY
On February 1st, 2025 from 00:13 UTC to 00:59 UTC, customers on POD 26 experienced issues with accessing archived tickets. During this time, multiple database reader nodes were unable to open a table due to a defect in the database system. This resulted in failed queries for archived tickets.
TIMELINE
February 01, 2025 01:13 AM UTC | January 31, 2025 05:13 PM PT
We are happy to report, that the issue causing errors impacting a group of our Support customers on POD 26 has now been resolved. Thank you for your patience during our investigation.
February 01, 2025 12:57 AM UTC | January 31, 2025 04:57 PM PT
Our engineers believe they have identified the root cause of the errors impacting a group of our Support customers on POD 26 and are working to address the issue.
February 01, 2025 12:57 AM UTC | January 31, 2025 04:57 PM PT
We are investigating potential errors for our Support customers hosted on POD 26.
POST-MORTEM
Root Cause Analysis
This incident was caused by a defect in the database system that prevented cluster reader nodes from accessing an archived tickets table. The defect was confirmed by our vendor technical support and was specific to the database version installed at the time.
Resolution
To resolve this issue, our engineers halted a deployment to other shards, and allowed the ongoing modifications to complete on the impacted shards. At that point the database table was accessible. Subsequently, the team plans to upgrade to a new version of our database system, which includes a patch for the identified defect.
Remediation Items
- Upgrade to the patched version or later before resuming schema changes.
- Split column additions and index drops into separate actions to minimize risk during deployments.
- Update the run-book to require that large migrations reach only one cluster initially before expanding to others.
- Implement a regular review process (at least annually) of database system patches and establish an upgrade cadence.
FOR MORE INFORMATION
For current system status information about Zendesk and specific impacts to your account, visit our system status page. You can follow this article to be notified when our post-mortem report is published. If you have additional questions about this incident, contact Zendesk customer support.
1 comment
Bob Novak
Post-mortem published Feb 6, 2025
0