SUMMARY
On March 13, 2024 from 21:13 UTC to 23:36 UTC, some Guide customers across all Pods, may have experienced significant delays when attempting to load ticket forms on the Help Center request page. The forms would eventually appear, but only after multiple minutes or page refreshes, hindering the ability to create new service requests promptly.
Timeline
21:30 UTC | 14:30 PT
We are investigating reports of Guide ticket forms on the request page not loading properly across all pods. We will provide another update in 30 minutes or when we have new information to share.
22:01 UTC | 15:01 PT
Our engineers continue to investigate the issue causing Guide ticket forms to not load properly. We will provide another update in 1 hour or when we have new information to share.
23:00 UTC | 16:00 PT
The issue causing Guide ticket forms to not load properly continues to be investigated by our technical teams. We will provide another update when we have new information to share.
23:41 UTC | 16:41 PT
Our team is observing recovery in Guide ticket forms following a configuration change. Please instruct end users to hard refresh their browsers and try again. We will provide a final update upon full resolution.
01:24 UTC | 18:24 PT
We are happy to report that the Guide ticket forms issue is now resolved. Thanks for your patience while we worked through this issue.
POST-MORTEM
Root Cause Analysis
This incident was caused by an update to a caching rule by our CDN partner, which mistakenly treated multiple distinct pages as identical due to a missing parameter in the cache key configuration. This led to random and incorrect caching of pages, affecting all customers with custom help center forms.
Resolution
To fix this issue, the faulty cache rule was identified and disabled, which restored functionality. Further investigation revealed a bug in the CDN partner's infrastructure, which prevented a proper rollback of the changes. This was subsequently corrected, and the cache rule was restored to its original, correct configuration.
Remediation Items
- Improve testing prior to implementation.
- Create additional alerts where feasible.
- Discuss cache management issues and review previous occurrences of similar problems.
FOR MORE INFORMATION
For current system status information about your Zendesk, check out our system status page. The summary of our post-mortem investigation is usually posted here a few days after the incident has ended. If you have additional questions about this incident, please log a ticket with us via ZBot Messaging within the Widget.