16:00 UTC | 09:00 PT
We have resolved the errors with Zendesk Chat. If you happen to see further errors or have concerns, please let us know.
15:47 UTC | 08:47 PT
We've identified a cause for the Chat issues & put in place mediation efforts. Please let us know if you are still seeing issues with Chat.
15:27 UTC | 08:27 PT
We are still looking to identify the cause of the incident. We apologize for any inconvenience. Please stay tuned for further information.
15:00 UTC | 08:00 PT
We are currently experiencing connectivity issues with Zendesk Chat and the Chat Widget. Updates to follow.
Due to the chat service incident that occurred about July 12th, our proxy servers logged timeouts/exceptions aggressively to disk which resulted in higher disk utilization for the period. We ran out of disk space on few proxies as a result.
At about the same time, we also had higher than usual network packet loss between proxies and our backend. This just escalated the problem and made the disconnections too frequent adversely impacting user experience.
Due to both these reasons, proxy workers' uptime was severely impacted because of worker management code retiring a worker for high RSS usage, causing dashboard frontend to disconnect/reconnect too often.
The situation resolved as soon as the network recovered and we cleared disk space again across all proxies.
In order to prevent this from happening again in the future, we will add additional chat monitoring, reduce logs in proxies, and improve the proxy failover process during intermittent connectivity issues.
FOR MORE INFORMATION
For current system status information about your Zendesk, check out our system status page. During an incident, you can also receive status updates by following @ZendeskOps on Twitter. The summary of our post-mortem investigation is usually posted here a few days after the incident has ended. If you have additional questions about this incident, please log a ticket with us.