SUMMARY
From November 18, 2024 at 14:00 UTC to 9:25 UTC on November 20, Zendesk WFM customers in all Pods which have reports configured to group results using Location and Team properties started reporting that metrics were showing 0s in these cases, whereas for the same reporting period grouped by different properties reports were showing results.
This issue impacted only reports that used Location and Teams properties.
TIMELINE
November 20, 2024 09:45 AM UTC | November 20, 2024 01:45 AM PT
Some WFM customers may have experienced reports resetting to zero when using team or location filters over the past 36 hours. This issue has now been resolved for all data moving forward. We are actively working to backfill the affected data. We sincerely apologize for any inconvenience this may have caused and appreciate your understanding.
POST-MORTEM
Root Cause Analysis
This incident was caused by a misconfiguration in the application following a recent release. The update introduced a new templating system that changed how the application constructed the URL for collecting Location and Team property values. This change broke the HTTP client, preventing it from locating the necessary service and leading to the skipping of data enrichment after five retries. As a result, the reports could not retrieve the required data, resulting in zero metrics being displayed.
Resolution
To resolve this issue, we rolled back the recent changes to the templating system, restoring the previous functionality that correctly constructed the URLs for data enrichment. This allowed the service to successfully retrieve Location and Team property values, restoring accurate reporting for affected customers.
Remediation Items
- Implement alerts for increased error logs in the environment to proactively address issues before they impact customers.
- Conduct a thorough review of the templating system and its configurations to ensure no similar misconfigurations occur in the future.
- Enhance the monitoring of the service to catch data enrichment failures earlier in the process.
- Establish a more robust testing protocol for future releases to identify potential issues with data collection mechanisms before deployment.
FOR MORE INFORMATION
For current system status information about your Zendesk, check out our system status page. The summary of our post-mortem investigation is usually posted here a few days after the incident has ended. If you have additional questions about this incident, contact Zendesk customer support.
1 comment
Bob Novak
Post-mortem published Dec 4, 2024
0