SUMMARY
On June 1st, 2021 from 06:00 UTC to 15:30 UTC, Support customers on POD 23 who had an agent's role changed in the Admin Center within the times mentioned, would’ve had to repeat the change in order for it to be correctly completed as the system was not initially correctly effectuating those.
Timeline
16:31 UTC | 09:31 PT
Our team has deployed a fix for the issue impacting agent role changes for some accounts on Pod 23 and will be contacting the impacted accounts directly. This issue has now been resolved.
15:52 UTC | 08:52 PT
Our team has identified an issue impacting agent role changes for some accounts on Pod 23. If an agent's role was changed within your account between 06:00 and 15:30 UTC time, it may be necessary to repeat the change in order for it to actually take effect.
POST-MORTEM
Root Cause Analysis
This incident was caused by the account's products record being in a bad state - the Light agent Add-On record was not created when accounts were subscribed to the new Zendesk Suite Support Plan with light_agents Add-On included as per new Plan and Pricing effective from Feb 1st., 2021 - which made the account call the internal server too many times, causing issues with the processing of role and entitlements changes because the system was unable to complete the actions.
Resolution
To fix this issue, the team manually changed the undelivered messaging pattern to allow the broken repeated messages to be moved from the stream-processing framework implementation queue allowing the system to continue working on other tasks.
Remediation Items
- Create an anomaly monitor for memory usage and fix memory leaks.
- For failed transactions from Account Updates, investigate ways to replay them so they can be effective.
FOR MORE INFORMATION
For current system status information about your Zendesk, check out our system status page. During an incident, you can also receive status updates by following @ZendeskOps on Twitter. The summary of our post-mortem investigation is usually posted here a few days after the incident has ended. If you have additional questions about this incident, please log a ticket with us.