On September 30, 2023 from 13:50 UTC to 14:29 UTC, Zendesk Support customers on Pod 20 may have experienced server errors when attempting to log in, update/create tickets and when performing other actions in the product. Less than 3% of requests encountered errors during the disruption period.
14:34 UTC | 07:34 PT
We are receiving reports of customers on Pod 20 not receiving new tickets, updating and saving existing tickets, receiving 5xx API errors and being unable to login. Investigating is underway and we will provide another update shortly.
14:50 UTC | 07:50 PT
Services on Pod 20 have resumed normal function. Customers are able to login and ticket processing has returned to regular levels. Tickets that were not created during the incident time window were retried and created automatically afterwards. We will now consider this incident as resolved. Thank you for your patience and we apologise for any inconveniences caused.
Root Cause Analysis
This incident was caused by operator error when applying a vendor-managed security update to a piece of network infrastructure in Pod 20. This error resulted in multiple connectivity issues to a datastore resulting in the errors experienced by some customers during the incident period.
To fix this issue, our vendor rolled back the erroneous configuration change.
- Adjust reconnection mechanism for job queuing [Completed]
- The vendor will automate network configuration changes to eliminate operator errors [In Progress]
- Reconfigure ticket creation process to ensure tickets are created when connectivity issues are encountered [In Progress]
FOR MORE INFORMATION
For current system status information about your Zendesk, check out our system status page. The summary of our post-mortem investigation is usually posted here a few days after the incident has ended. If you have additional questions about this incident, please log a ticket with us via ZBot Messaging within the Widget.