SUMMARY
On January 31, 2025 from 13:15 UTC to 17:41 UTC customers experienced some agents and end-users appearing to be suspended in the UI when they should not have been. These users were not actually suspended, and after rolling back an update and having customers clear cache, all access was fully restored.
TIMELINE
January 31, 2025 06:15 PM UTC | January 31, 2025 10:15 AM PT
We are happy to report that we have resolved the issue causing some users to be suspended unexpectedly. Thank you for your patience during our investigation.
January 31, 2025 05:52 PM UTC | January 31, 2025 09:52 AM PT
We have rolled back an update which we believe is responsible for the unexpected agent and end-user suspensions, and after clearing cache and refreshing your browser, user access should be restored at this time. Please let us know if you continue to experience any issues.
January 31, 2025 05:48 PM UTC | January 31, 2025 09:48 AM PT
We are still investigating the issue causing unexpected user suspensions across multiple pods. We will provide additional updates in the next hour or when we have new information to share.
January 31, 2025 05:23 PM UTC | January 31, 2025 09:23 AM PT
Our team continues to investigate the issue causing unexpected agent and end-user suspensions across multiple pods. We will post additional updates within the next 30 minutes or as we learn more.
January 31, 2025 04:59 PM UTC | January 31, 2025 08:59 AM PT
We have confirmed an issue causing unexpected agent and end-user suspensions across multiple pods, and our team is investigating. Further updates will be posted within the next 30 minutes.
January 31, 2025 04:46 PM UTC | January 31, 2025 08:46 AM PT
We are receiving reports of unexpected agent and end-user suspensions, and our team is investigating. More information will be posted shortly.
POST-MORTEM
Root Cause Analysis
This incident was caused by a deployment that replaced an implementation on User Settings. The new implementation failed to correctly handle NULL values, leading the system to incorrectly identify users as suspended. The previous system converted NULL to an empty string, which was recognized as a false value, while the new implementation did not perform this conversion, resulting in a true return for suspended status when it should have been false.
Resolution
To resolve the issue, the team initiated a rollback of the problematic deployment. This action restored normal functionality, allowing users to regain access to their accounts. Additionally, a browser cache refresh was conducted to ensure the fix was effective.
Remediation Items
- Data Migration: Conduct a migration to correct NULL values in the suspended settings to prevent similar issues in the future.
- Deployment Context: Ensure that future deployment attempts include the proper context and learnings from this incident.
- Unit Test Review: Add unit tests to cover scenarios involving NULL values to ensure robust validation in future changes.
- Data Coverage Check: Review data to ensure all instance values are adequately covered and accounted for.
FOR MORE INFORMATION
For current system status information about Zendesk and specific impacts to your account, visit our system status page. You can follow this article to be notified when our post-mortem report is published. If you have additional questions about this incident, contact Zendesk customer support.
1 comment
Bob Novak
Post-mortem published February 12, 2025
0