14:23 UTC | 07:23 PT
The issues affecting Search on POD6 have been resolved. Thank you for your patience.
14:15 UTC | 07:15 PT
The issue on POD6 Search has improved. Thank you for your patience, while we continue to monitor the situation.
13:58 UTC | 06:58 PT
We are progressing with our investigation on the Search issues on POD6. Only Ticket search is currently failing. More info to follow.
13:44 UTC | 06:44 PT
We are currently experiencing Search issues on POD6 accounts. Updates to follow.
At 13:07 UTC we lost a disk in the production search cluster in pod6 . The cluster has been designed to survive disk failures without customer impact. Unfortunately the disk failure happened while reindexing of the tags index was taking place. The failed disk during tags reindex caused the cluster to become unhealthy, which in turn caused our indexes to stop updating (this is by design, to avoid data loss). To immediately mitigate the situation, at 13:14 UTC, the Search dev team made the correct decision to failover to our backup cluster to reduce customer impact. Unfortunately, the search service code on the backup clusters were not running the latest version of the code yet which caused customers' ticket search is not being returned properly. After the main cluster recovered, we quickly switched back traffic to the main cluster and service was restored back to normal. To prevent this from happening again in the future we will be consolidating all search service code into one project so that all clusters are up to date and the failover works as expected to recover without issue.
FOR MORE INFORMATION
For current system status information about your Zendesk, check out our system status page. During an incident, you can also receive status updates by following @ZendeskOps on Twitter. The summary of our post-mortem investigation is usually posted here a few days after the incident has ended. If you have additional questions about this incident, please log a ticket with us.