SUMMARY
On 2024-07-16 from 10:47 UTC to 13:13 UTC and 2024-07-17 from 03:40 UTC to 2024-07-17 14:50 UTC, some customers experienced slowness and errors while using Zendesk Workforce Management (WFM) due to over-utilization of communication capacities allocated by our network provider. This affected users trying to connect to core services like the Agent Activity Tracker, resulting in performance issues.
Timeline
July 17, 2024 12:28 PM UTC | July 17, 2024 05:28 AM PT
Unfortunately, we experienced the similar issues again today, resulting in another spike in errors with some of the same symptoms. Clearing your cache or using an incognito window should resolve the issue. We will be monitoring errors for the next 24 hours, and if there are no further occurrences, the new incident will be merged into this one. Thank you for your patience and understanding.
July 16, 2024 04:32 PM UTC | July 16, 2024 09:32 AM PT
We've received multiple customer confirmations that clearing cache and cookies has restored access to Zendesk WFM and will be marking this incident as fully resolved.
July 16, 2024 02:38 PM UTC | July 16, 2024 07:38 AM PT
We have observed that the errors on the backend have subsided. We kindly request that you clear your cache and cookies and then confirm whether things are working as expected.
July 16, 2024 01:59 PM UTC | July 16, 2024 06:59 AM PT
In addition to the general platform errors mentioned previously, some customers are also unable to load the WFM widget correctly within Support and the Chrome extension, across multiple Pods.
July 16, 2024 01:42 PM UTC | July 16, 2024 06:42 AM PT
Upon receiving negative confirmation from customers regarding the resolution of the issues, we continue to investigate the problem with Zendesk WFM (formerly Tymeshift). A subset of customers are still seeing the errors “You don’t have access to this feature” and “Something didn’t work / Give it a moment and try again” when trying to access the organization structure. We will provide more information in 1 hour or as soon as we have additional details to share.
July 16, 2024 01:06 PM UTC | July 16, 2024 06:06 AM PT
We are currently seeing improvements in the backend handling of traffic for the impacted accounts, and these accounts should now be able to start loading Zendesk WFM within Zendesk.
July 16, 2024 12:39 PM UTC | July 16, 2024 05:39 AM PT
We are currently investigating reports of slowness or inability to load Zendesk WFM (formerly known as Tymeshift) for some customers across multiple Pods. We will provide more updates in 30 minutes or as soon as we have more details.
POST-MORTEM
Due to a configuration limit, some of our network traffic exceeded the maximum allowed usage during peak times. This wasn't immediately visible, and the resultant delays impacted the customer experience. Once aware, we worked quickly to increase this limit and adjust our configuration to better handle traffic.
Root Cause Analysis
The root cause was a default setting that limited the number of communication channels available, which we exceeded as our customer base and traffic grew.
Resolution
By increasing the capacity and reconfiguring the appropriate settings, we have resolved the underlying issue. Additional steps were taken to ensure continued stability.
Remediation
To prevent similar issues in the future, we are implementing the following measures:
1. Capacity Monitoring: We are adding proactive monitoring to alert us when we approach capacity limits.
2. Configuration Reviews: We are reviewing and adjusting our configurations across all regions to ensure they are prepared for increased traffic.
3. Improved Response Plans: We are strengthening our incident response procedures to quickly diagnose and resolve any future issues.
FOR MORE INFORMATION
For current system status information about your Zendesk, check out our system status page. The summary of our post-mortem investigation is usually posted here a few days after the incident has ended. If you have additional questions about this incident, contact Zendesk customer support.