Recent searches

No recent searches

Service Incident - August 9, 2024 - Support | Pod 17 - Issues with ticket opening/viewing

Zendesk Customer Care

Edited Aug 19, 2024

SUMMARY

On August 9, 2024 from 15:46 UTC to 15:57 UTC, Support customers on Pod 17 experienced various issues such as error codes, slow loading times, and inability to open tickets or view messages within the product UI.

Timeline

August 09, 2024 04:13 PM UTC | August 09, 2024 09:13 AM PT
We're investigating reports of users being unable to view Support tickets on Pod 17 and are already seeing recovery. We will provide additional updates in 30 mins or sooner as we confirm full stability.

August 09, 2024 04:32 PM UTC | August 09, 2024 09:32 AM PT
From 15:46 UTC to 15:57 UTC, Support customers on Pod 17 experienced issues loading tickets. Performance has stabilized and we will continue to monitor performance. Next update in an hour or when we have new information.

August 09, 2024 04:51 PM UTC | August 09, 2024 09:51 AM PT
The Support performance issues that occurred on Pod 17 from 15:46 UTC to 15:57 UTC are now fully resolved. We apologize for any inconvenience caused and appreciate your patience.

POST-MORTEM

Root Cause Analysis

This incident was caused by the unexpected reboot of a system that speeds up data retrieval by caching information in memory. Due to an inadequate response to this failure, the Agent-graph component continued to wait up to 60 seconds for a response, causing timeout errors and resulting in 503 service errors. Contributing factors include that the system did not switch to an alternative data source in a timely manner, and the monitors in place did not trigger alerts because the issue was resolved before hitting the thresholds.

Resolution

To fix this issue, the system automatically recovered as the memory-caching system came back online. We identified that the reboot of this system caused the delays, and it was confirmed that the issue was self-resolving, requiring no immediate manual intervention to restore service.

Remediation Items

Reduced timeout for user cache retrieval.
Consider performing chaos testing to simulate such failures in a controlled environment.
Review and adjust alert thresholds to ensure quicker detection and response time.
Reach out to AWS to investigate the unexpected reboot of the memory-caching system to prevent similar future occurrences.

FOR MORE INFORMATION

For current system status information about your Zendesk, check out our system status page. The summary of our post-mortem investigation is usually posted here a few days after the incident has ended. If you have additional questions about this incident, contact Zendesk customer support.

1 comment

Date

Jessica G.

Zendesk Customer Care

Aug 19, 2024

Post-mortem published August 19, 2024.

Article is closed for comments.

Service Incident - August 9, 2024 - Support | Pod 17 - Issues with ticket opening/viewing

SUMMARY

Timeline

POST-MORTEM

Root Cause Analysis

Resolution

Remediation Items

FOR MORE INFORMATION

1 comment

ADDITIONAL CONTENT

Common topics

Role-based guides

Additional resources

Service Incident - August 9, 2024 - Support | Pod 17 - Issues with ticket opening/viewing

SUMMARY

Timeline

POST-MORTEM

Root Cause Analysis

Resolution

Remediation Items

FOR MORE INFORMATION

1 comment

ADDITIONAL CONTENT

Related articles