SUMMARY
From November 22, 2021 14:20 UTC to November 26, 2021 14:46 UTC some Explore customers across multiple pods experienced problems when filtering “Ticket by group” name, resulting in none or blank information returned for that filter in the queries.
Timeline
14:46 UTC (Nov 26) | 06:46 PT (Nov 26)
We are happy to report, that our team has finished restoring all Ticket Group historical data in Explore for all our impacted customers. We apologize for any inconvenience and thank you for your patience.
02:37 UTC (Nov 25) | 18:37 PT (Nov 24)
We are still working on restoring historical data for the Ticket Group attribute in Explore. It is estimated that this data will begin reappearing in affected Zendesk accounts from late afternoon November 26, to be completed by end of day November 27 (UTC). A final update will be provided when the backfill is completed.
23:41 UTC | 15:41 PT
The Ticket Group attribute is displaying as expected in queries and filters as of 15:25 UTC, and we are working to backfill any impacted data. Please let us know if you continue to see any issues.
17:11 UTC | 09:11 PT
We are seeing slight improvement in the behavior affecting group data in Explore. Explore data synced after 15:25 UTC is displaying correctly; however, we continue to investigate a solution for historical data. We will provide another update as soon as we have additional information.
15:48 UTC | 07:48 PT
We continue our investigation into the issue with Explore customers not seeing groups in their queries, or being able to filter via groups in their queries. We will provide further updates as soon as possible.
14:42 UTC | 06:42 PT
We are investigating reports of some of our Explore customers not seeing groups in their queries, or being able to filter via groups in their queries.
POST-MORTEM
Root Cause Analysis
This incident was caused by an escaped defect deployed to production on November 22nd.
Resolution
To fix this issue the faulty deployment was reverted. We could confirm that after that the group name denormalisation started working again as expected. However, any ticket processed by Explore between Nov 22nd 2:00 PM UTC and Nov 23rd 3:25 PM UTC could potentially contain blank group names so we started manual interventions that were completed according to the timeline above. At this stage, all customers' reports had their data fixed.
Remediation Items
- Add a test to denormalized data in Explore ETL.
- Implement stronger production data integrity monitoring and alerting.
FOR MORE INFORMATION
For current system status information about your Zendesk, check out our system status page. During an incident, you can also receive status updates by following @ZendeskOps on Twitter. The summary of our post-mortem investigation is usually posted here a few days after the incident has ended. If you have additional questions about this incident, please log a ticket with us.