SUMMARY
From November 9, 2020 16:40 UTC to November 11, 2020 01:15 UTC, Sell customers may have experienced data delays for the Firehose API and the Sell services relying on it, including Dashboard, Deduplication, Object Distribution, Engage and Mentions.
17:12 UTC | 09:12 PT
Thank you for your patience, we are pleased to report the data synchronisation issues affecting multiple services in Sell have been resolved. Please let us know should this appear to not be the case for your account.
02:28 UTC | 18:28 PT
While Sell remains fully operational we will require additional time to rectify any outstanding data discrepancies. We anticipate this work will require an additional 24 hours and will provide an update then.
02:02 UTC | 18:02 PT
We’re happy to report that all data synchronization delays have been restored across all Sell services. The team is actively working on making sure that there are no data inconsistencies. Thank you for your patience.
00:32 UTC | 16:32 PT
We are seeing improvements in all other Sell services with the exception of Reach. Any small data inconsistencies should be cleared up within the next day or two.
19:34 UTC | 11:34 PT
Our teams have identified potential causes of synchronisation issues that are impacting Sell, and are continuing to move forward with resolution steps. Another update will be posted when more information is available
17:45 UTC | 09:45 PT
We have identified potential causes of synchronisation issues that are impacting Sell, and are moving forward with implementing resolutions.
14:24 UTC | 06:24 PT
We are continuing to work on resolving data synchronisation issues that are impacting Sell. We will follow up with more information when available.
13:31 UTC | 05:31 PT
We are still working on resolving data synchronisation issues that are impacting Sell. We will follow up with more information when available.
10:47 UTC | 02:47 PT
We are continuing our investigation into issues regarding data synchronisation affecting the Sell product. More updates to follow
09:27 UTC | 01:27 PT
We are investigating data synchronisation issues affecting multiple services in Sell. We will keep you updated.
Root Cause Analysis
This incident was caused by a corrupted file system on our data platform which led to a broker failure. A critical part of our infrastructure was unable to support the API used directly by customers and within Zendesk Sell product experiences.
Resolution
To fix this issue, engineering teams rebuilt the entire cluster on our data platform, and on November 10th at 17:16 UTC all brokers were rebuilt and fully recovered.
Once the cluster was fully operational, backfilling the cache took 48 hours and during that time customers would be able to use the API but would have experienced delays.
On November 12 at 17:32 UTC the data load completed successfully and we resumed normal operations.
Remediation Items
- Investigate options for increased stability to prevent corruption in our data platform,
- Improve Sell availability characteristic for Firehose API.
FOR MORE INFORMATION
For current system status information about your Zendesk, check out our system status page. During an incident, you can also receive status updates by following @ZendeskOps on Twitter. The summary of our post-mortem investigation is usually posted here a few days after the incident has ended. If you have additional questions about this incident, please log a ticket with us.