SUMMARY
On March 15, 2022 from 18:17 to 18:43 UTC, Agent Workspace customers on Pod 19 could have seen delayed messages and missed chats. We have recovered and all backlogged messages have been processed.
Timeline
19:04 UTC | 12:04 PT
Between 18:17 and 18:43 UTC on March 15, 2022, Agent Workspace customers on Pod 19 could have seen delayed messages and missed chats. We have recovered and all backlogged messages have been processed.
POST-MORTEM
Root Cause Analysis
In certain scenarios, when an instance was brought down and then back up, the instance spent too much computing power processing the changelog topics. This results in Kafka thinking the client is dead and kicks it out of the group while it soon rejoins, which then repeats.
Resolution
A manual deploy was executed and service was restored to normal.
Remediation Items
- Create alerts for excessive rebalancing
- Update runbook to not restart kpod directly
FOR MORE INFORMATION
For current system status information about your Zendesk, check out our system status page. The summary of our post-mortem investigation is usually posted here a few days after the incident has ended. If you have additional questions about this incident, please log a ticket with us via ZBot Messaging within the Widget.