SUMMARY
On March 28, 2022 from 08:22 UTC to 09:33 UTC, Zendesk Chat customers were presented with a ‘502 Bad Gateway’ error screen when trying to use the platform.
Timeline
09:48 UTC | 02:48 PT
Our team is investigating reports of Chat customers receiving 502 errors. More information to follow.
10:05 UTC | 03:05 PT
We are no longer seeing 502 errors in Chat for our impacted customers on POD 29. We continue to monitor the situation and will provide a further update within the next 30 minutes.
10:17 UTC | 03:17 PT
We are happy to report that the issue on POD 29, causing our Chat customers to receive 502 errors has now been resolved. We apologise for the inconvenience.
POST-MORTEM
Root Cause Analysis
This incident was caused by a resource configuration issue after an internal migration. This impacted Pod 29’s memory allocation. The service used the default configuration which was lower than the expected setup for the Pod, causing repetitive failing and restarting and leading to the Chat platform being inaccessible.
Resolution
To fix this issue, a redeploy was done to correct and increase the memory allocation configuration. Once this was completed, service recovery was observed and we had customer confirmation of issue resolution.
Remediation Items
- Fix resource allocation configuration [Done]
- Update runbook to improve efficiency of dealing with similar scenarios [In progress]
- Create an additional page for the responsible team in order to detect and action the issue as fast as possible.
- Added connection limits on specific applications regarding the memory allocation.
FOR MORE INFORMATION
For current system status information about your Zendesk, check out our system status page. The summary of our post-mortem investigation is usually posted here a few days after the incident has ended. If you have additional questions about this incident, please log a ticket with us via ZBot Messaging within the Widget.
1 Comments
Post-Mortem published April 6, 2022.
Article is closed for comments.