Summary
From October 7, 2024 at 22:30 UTC until October 8, 2024 at 06:00 UTC, some customers using Ultimate bots were unable to create tickets. This impacted messaging conversations with end users.
Timeline
October 08, 2024 03:52 AM UTC | October 07, 2024 08:52 PM PT
Our engineering teams are aware of issues being experienced by a small subset of customers using Ultimate chat bots over the last 5 hours. Due to some complexities, investigations have been slow moving. We will provide another update in 2 hours. Thanks for your patience while we work through this issue.
October 08, 2024 05:52 AM UTC | October 07, 2024 10:52 PM PT
Our engineering team has identified the root cause of the issue impacting some Ultimate chat bots. We are currently testing some options to fix this issue. This will be the final update posted to the Zendesk status page. Please refer to the Ultimate status page for future updates: https://status.ultimate.ai/incidents/hkttfhfgrplq.
Root Cause Analysis
The incident was caused by our Redis cache for Sunshine Automation reaching its memory limit. Redis, acting as a cache to store frequently accessed data, experienced memory overuse due to improper key expiration, which prevented the bots integrated with Sunshine Automation from processing messages effectively. Our alert systems failed to notify the on-call team, delaying the start of the investigation until around 05:00 UTC. After the investigation, the team increased the Redis memory, resolving the incident by 06:00 UTC. As a follow-up we will address the Redis key expiration issue and improve our alert system for Redis memory usage to ensure faster response times in the future.
Remediation items
- Address the Redis key expiration issue
- Improve Redis memory usage monitoring and alerts
- Update internal escalation processes to improve Ultimate bot incident response times
FOR MORE INFORMATION
For current system status information about your Zendesk, check out our system status page. The summary of our post-mortem investigation is usually posted here a few days after the incident has ended. If you have additional questions about this incident, contact Zendesk customer support.
1 comment
Dan Beirouty
Post-mortem published October 25, 2024.
0