SUMMARY
On March 11, 2025 from 14:33 UTC to 17:02 UTC, some Agents across all Pods experienced issues creating and updating articles in Guide. Guide Admins were not impacted.
TIMELINE
March 11, 2025 05:10 PM UTC | March 11, 2025 10:10 AM PT
We have identified the issue with Guide article publishing and have successfully rolled back the problematic deploy. Please let us know if you continue to see any issues.
March 11, 2025 05:03 PM UTC | March 11, 2025 10:03 AM PT
We are aware of issues with creating and updating articles in Guide across multiple pods. We will provide more information shortly.
POST-MORTEM
Root Cause Analysis
This incident was caused by a configuration error in the Guide Article Service where the HTTP headers required for queries to the user segment service were incorrectly named, leading to failed requests and preventing agents from creating or updating articles.
Resolution
To fix this issue, the team reverted the defective change, restoring the previous configuration that allowed for proper communication with the user segment service. The correct headers will be implemented in future changes to avoid similar issues.
Remediation Items
-
Implement smoke tests to catch configuration errors before deployment.
-
Improve existing monitoring tools to ensure alerts are actionable and not ignored.
-
Create additional alerts that specifically monitor for critical header configurations in requests.
-
Establish connection limits on specific applications to prevent overload and ensure stability during high traffic.
This structured approach will help ensure that similar incidents do not occur in the future and that the service remains reliable for all customers.
FOR MORE INFORMATION
For current system status information about Zendesk and specific impacts to your account, visit our system status page. You can follow this article to be notified when our post-mortem report is published. If you have additional questions about this incident, contact Zendesk customer support.