Summary
On May 13, 2026 from 16:10 UTC to 17:39 UTC, Zendesk Voice customers on Pod 13 experienced failed inbound and outbound calls, including calls being dropped or diverted to voicemail.
Timeline
May 13, 2026 17:49 UTC | May 13, 2026 10:49 PT
We are experiencing errors with both inbound and outbound calling on pod 13. Our team is investigating and we will provide further information as soon as we can.
May 13, 2026 18:13 UTC | May 13, 2026 11:13 PT
We have confirmed an issue with pod 13 causing errors for inbound and outbound calls, and our team is looking into potential root causes. We will continue to post updates as we learn more.
May 13, 2026 18:50 UTC | May 13, 2026 11:50 PT
We are happy to report that the issue causing inbound and outbound call errors for pod 13 has been resolved, and calls are being placed as expected at this time. Thank you for your patience during our investigation.
Root Cause Analysis
This incident was caused by a capacity exhaustion issue in our background job processing system on Pod 13. Over time, the system accumulated more retained work history than it could safely store. This led to resource exhaustion, which caused call handling requests to fail and surface as elevated server errors.
Resolution
To fix the issue, we increased the capacity allocated to background job processing on Pod 13, which restored normal call handling and stopped the call failures.
Remediation Items
- Improve early-warning monitoring and paging so we’re alerted before call impact occurs.
- Review and tune the priority/thresholds of capacity-related alerts so they trigger sooner and more appropriately.
- Update troubleshooting documentation for responding to call-impacting capacity issues.
- Add clear start/finish/success metrics for the scheduled cleanup task that manages retained background-processing data.
- Improve logging/observability for the automation that runs scheduled maintenance tasks, so missed runs are visible.
- Add dedicated monitoring for the datastore used by background job processing.
- Improve how failed background work is stored/retained so it doesn’t accumulate indefinitely.
- Revisit and clarify severity definitions for Voice outages to ensure consistent escalation.
- Evaluate isolating Voice’s background-processing datastore so a capacity issue is less likely to impact calling.