Summary
Between May 2, 2023 from 07:00 UTC to 15:45 UTC, and May 03, 2023 between 06:05 UTC and 14:55 UTC, Zendesk Support customers on Pod 17 experienced performance issues and the inability to load tickets and chats.
Timeline
10:02 UTC | 03:02 PT
We are currently investigating reports of customers on Pod 17 experiencing issues with loading tickets. We will provide more information shortly
10:31 UTC | 03:31 PT
We continue to investigate issues with loading tickets for customers on Pod 17. We will provide another update in 30 mins or when more information becomes available.
11:13 UTC | 04:13 PT
We are seeing improvements for customer on Pod 17. We’re still monitoring the issue and will keep you informed.
15:57 UTC | 08:57 PT
We are seeing a new rise in errors affecting Pod 17 as well as Pod 23, resulting in issues loading tickets. Our team is investigating and we will provide further updates as soon as we can.
16:36 UTC | 09:36 PT
We are happy to report that the errors causing issues loading tickets in Pod 17 have been resolved. The rise in errors on Pod 23 was investigated but found to have no correlation to the ticket loading issues. Thank you for your patience during our investigation.
POST-MORTEM
Root Cause Analysis
This incident was caused by a capacity issue with our application servers coupled with an ungraceful shutdown of the application servers as part of internal infrastructure processes.
Resolution
To fix this issue, our team scaled up our applications servers and adjusted memory configurations on those servers.
Remediation items
- Refine existing monitoring and alerts to match customer impact [Completed]
- Implement additional SLOs for the impacted service [Scheduled]
- Scale up application servers [Completed]
- Adjust memory configuration on application servers [Completed]
- Gracefully terminate containers to ensure connections have ended [Completed]
FOR MORE INFORMATION
For current system status information about your Zendesk, check out our system status page. The summary of our post-mortem investigation is usually posted here a few days after the incident has ended. If you have additional questions about this incident, please log a ticket with us via ZBot Messaging within the Widget.
1 Comments
Postmortem published on May 11, 2023.
Article is closed for comments.