SUMMARY
On August 18, 2025, between 19:20 UTC and 21:03 UTC, we received 31 reports from customers experiencing 500 errors when attempting to access the kibbles.klaus.com URL within the Zendesk QA service.
Timeline
August 18, 2025 21:45 UTC | 13:44 PT
We are receiving reports of access issues and 5xx level errors reaching the QA product across multiple pods. Further information will be posted shortly.
August 18, 2025 22:04 UTC | 14:04 PT
We have confirmed an issue causing 5xx level errors and preventing access to the Zendesk QA product. Our team is investigating and we will provide further updates in the next 30 minutes.
August 18, 2025 22:25 UTC | 14:25 PT
We are beginning to see improvement in the issue preventing access and causing 5xx level errors when reaching the Zendesk QA product. Please let us know if you continue to experience any issues.
August 18, 2025 23:26 UTC | 15:26 PT
We have observed significant improvement from the issue preventing access and causing 5xx level errors in Zendesk QA and are moving to monitor the issue to ensure recovery. We will provide another update when we have reached full resolution.
August 19, 2025 14:43 UTC | August 19, 2025 07:43 AM PT
We are pleased to inform you that the issue causing 500 errors when accessing the kibbles.klaus.com URL has been resolved. Our engineers have mitigated the problem, and customers should no longer encounter these errors.
We would appreciate your feedback to confirm whether the service is now functioning correctly on your end. Your prompt response will help us provide any further assistance if needed.
Root Cause Analysis
The incident was caused by a timeout in the gateway connection with an important security management service, which affected the system’s performance.
Resolution
To address the issue, engineers initially restarted the gateway protocol, which provided a temporary improvement but did not fully resolve the problem. Subsequently, the team restarted the Kubernetes pod, allowing the Zendesk QA service to fully recover from the 5XX errors.
Remediation Items
- Implement regular health checks and automatic restarts to keep the system running smoothly when issues are detected.
- Increase the number of system instances to handle higher traffic and reduce the impact if one part fails.
- Improve connection methods to use multiple links instead of relying on just one, ensuring better stability.
- Update software components regularly to use the latest versions, improving overall performance and reliability.
FOR MORE INFORMATION
For current system status information about Zendesk and specific impacts to your account, visit our system status page. You can follow this article to be notified when our post-mortem report is published. If you have additional questions about this incident, contact Zendesk customer support.