04:55 UTC | 21:55 PT
We are happy to confirm that the issues with intermittent login and visitor registration impacting Chat have been resolved. If you're still experiencing issues, please reach out to us.
04:36 UTC | 21:36 PT
We are currently investigating intermittent Chat login and visitor registration issues. We will provide an update as soon as we learn more.
On May 20, 2020 from 3:04 UTC to 4:20 UTC some customers using Zendesk Chat may have experienced agent login issues and visitor registration issues (Chat widget not appearing on some sites).
Root Cause Analysis
This incident was caused by a service discovery cluster issue. A cluster leader was evicted due to a data discrepancy resulting from corrupt raft data. This data corruption stemmed from a known bug in our service discovery software combined with a bad deployment. The leader eviction led to a “split-brain” issue where some nodes were inactive causing some Chat requests to fail.
To resolve this issue, the first inactive node was restarted resulting in it rejoining the cluster. The remaining nodes were then restarted and rejoined the cluster leading to full recovery of Zendesk Chat.
- Implement additional monitoring and alerting to alert on critical service discovery error messages.
- Upgrade service discovery software (includes bug fix).
- Correct invalid raft data.
FOR MORE INFORMATION
For current system status information about your Zendesk, check out our system status page. During an incident, you can also receive status updates by following @ZendeskOps on Twitter. The summary of our post-mortem investigation is usually posted here a few days after the incident has ended. If you have additional questions about this incident, please log a ticket with us.