Service Incident - February 11th, 2022 - Explore | EU region - Data Sync delay

SUMMARY

On February 11, 2022 we received multiple reports of an issue where some EU Explore customers experienced data sync delays for their queries and dashboards, however, those might have been affected from the day before, starting 02:34 UTC, up until 12:39 UTC of the date of this Service Incident.

Timeline

08:36 UTC | 00:36 PT

We are currently aware of an issue with Explore data sync delays for some EU customers. We are working to understand the cause of this and will keep you updated as we find out more. Explore continues to work for query and other core features. Thank you for patience!

09:42 UTC | 01:42 PT

Our investigation of the Explore data sync delays’ issue for some EU customers continues. We will provide a further update in 1h or as soon as we have further information.

10:32 UTC | 02:32 PT

Our team has applied some fixes and we are seeing improvements in the performance of the Explore data sync for some EU customers. We continue to monitor.

11:44 UTC | 03:44 PT

We are continuously seeing improvements in the data sync and load for a subset of Explore customers in the EU region. Thank you for your patience. A final update will be shared once we have confirmation the tool is fully functional.

12:50 UTC | 04:50 PT

We are happy to report that the data sync and load for a subset of Explore customers in the EU region is fully functional. We apologise for the inconvenience!

POST-MORTEM

Root Cause Analysis

This incident was caused by all Explore accounts hosted in one of the EU clusters getting their pipelines jammed. This prevented new pipelines to write in these clusters because of the concurrency limitation of the external system’s clusters.

Resolution

To fix this issue, the team terminated the jammed write pipelines on all clusters. We then saw direct progress of pending queries and after a few hours, all of them were completed and data sync resumed accurately.

Remediation Items

Track with the external system why the queries were blocked in the EU [In Progress]
Add timeout data warehouse service’s copy queries [To Do]
Implement alerts on jammed queries in the data warehouse service [To Do]

FOR MORE INFORMATION

For current system status information about your Zendesk, check out our system status page. During an incident, you can also receive status updates by following @ZendeskOps on Twitter. The summary of our post-mortem investigation is usually posted here a few days after the incident has ended. If you have additional questions about this incident, please log a ticket with us.