Summary
On January 30, 2026, from 15:10 UTC to 16:54 UTC, some customers experienced delays in receiving the latest updates in certain Explore features. While the overall service remained available, the information displayed during this time was not up to date.
Timeline
January 30, 2026, 04:59:24 PM UTC | January 30, 2026, 08:59:24 AM PST
Our engineers have identified the issue causing some Explore data sets to not receive updated data and have released the fix. Data that was missed during this period will appear upon the next Explore data refresh. Thank you for your patience.
January 30, 2026, 03:32:00 PM UTC | January 30, 2026, 07:32:00 AM PST
The impacted data sets include: Guide Generative Search, Guide Page Efficiency Analytics, Guide User Session Analytics, AI Suggestions, Intelligent Triage, and AI Auto Assist.
January 30, 2026, 03:23:24 PM UTC | January 30, 2026, 07:23:24 AM PST
We are investigating some Explore data sets for customers in all Pods not receiving updated data in the last two hours. We will provide additional updates shortly.
Root Cause Analysis
This incident was caused by a recent update that unintentionally stopped the system from properly clearing temporary resources. As a result, unused resources built up, putting extra strain on the infrastructure. This caused some key services to repeatedly restart, which prevented fresh data from being processed and updated.
Resolution
To fix this issue, the team reversed the recent update to a previous stable version, allowing the system to clear unused resources properly. They also increased memory limits for certain components to prevent crashes and temporarily reduced the load on some supporting services to help the recovery process. After these steps, the system stabilized, the affected service recovered, and data updates resumed as normal.
Remediation Items
Set up monitoring and alerts to quickly identify issues that prevent the system from clearing unused resources.
Added alerts to detect high memory usage and crashes in key services for faster response.
Increased memory limits for certain services to help them handle busy times more reliably.
Improved system logging to make it easier to spot errors and delays.
Introduced basic testing of resource creation and removal processes to catch potential problems before releases.