SUMMARY
On December 13, 2021 from 17:33 UTC to December 17, 2021 at 02:20 UTC, Talk and Support customers experienced search and data delays with the Talk dashboard and Side Conversations features.
Timeline
18:35 UTC | 10:35 PT
We are investigating issues with Talk dashboard data refresh delays and Side Conversation search result delays. We will provide more information in the next 30 minutes.
18:58 UTC | 10:58 PT
Our engineers are in the process of re-indexing Side Conversation search results to resolve the data refresh issue. The Talk Dashboard will show data from the last hour, while the last 24 hours of data should be refreshed by tomorrow.
20:06 UTC | 12:06 PT
Side Conversation search results are now being re-indexed and we expect the process to complete in the next 48-72 hours. The Talk Dashboard should be refreshed by tomorrow. We will provide an update when each processes completes in the next 72 hours.
23:06 UTC | 15:06 PT
The Talk Dashboard data is now fully refreshed. We will post a final update once the side conversation search reindexing is complete in the next 24-48 hours.
POST-MORTEM
Root Cause Analysis
This incident was caused by a bug in a node recycling tool. The bug caused search service nodes to get restarted prematurely leading to temporary data degradation.
Resolution
To fix this issue, our engineers reindexed the impacted indices from the source of truth. Talk Dashboards recovered relatively quickly, while Side Conversations data eventually recovered after a few days.
Remediation Items
- Deprecate the node recycling tool and replace with main deployment strategy (Completed)
- Allow Side Conversations to backfill to a single cluster (Completed)
FOR MORE INFORMATION
For current system status information about your Zendesk, check out our system status page. During an incident, you can also receive status updates by following @ZendeskOps on Twitter. The summary of our post-mortem investigation is usually posted here a few days after the incident has ended. If you have additional questions about this incident, please log a ticket with us.