On October 29, 2020 from 18:38 UTC to 21:35 UTC Talk customers experienced a brief period of call performance issues across all pods, with ongoing degradation seen in Pods 13, 20, and 25. Outbound calls and call status updates, such as placing a call on-hold, transferring, consultation, and hanging up saw intermittent failures.
21:55 UTC | 14:55 PT
We’re happy to report the issues impacting Zendesk Talk are now resolved. Please let us know if you continue to experience any issues with Talk performance.
20:59 UTC | 13:59 PT
We are still monitoring the issue impacting Zendesk Talk. We have identified the root cause as being upstream with our telephony partner. We are also seeing inconsistencies with Talk reporting dashboards resulting from this issue.
20:00 UTC | 13:00 PT
We are continuing to monitor the issue impacting Zendesk Talk. We apologize for the inconvenience and we will provide an update in one hour or sooner if more information becomes available before then.
19:16 UTC | 12:16 PT
We are seeing improvements and fewer dropped calls. Pods 13, 20, and 25 may still be impacted. All other pods have recovered.
19:04 UTC | 12:04 PT
We experienced a brief spike in dropped calls and errors impacting Talk customers on all pods. We're investigating the cause and will provide an update when we have more information.
Root Cause Analysis
This incident was caused by an update to our provider’s DNS zone file, which failed and caused lingering effects as our customers’ DNS caches were holding onto the failed DNS record.
To fix this issue, our telephony provider reverted to a backup DNS zone file. As Talk caches DNS records, Zendesk teams redeployed Talk to all pods to resolve the issue for customers and restore Talk functionality.
Zendesk teams are investigating new methods of maintaining call performance during disruptions at the provider level.
FOR MORE INFORMATION
For current system status information about your Zendesk, check out our system status page. During an incident, you can also receive status updates by following @ZendeskOps on Twitter. The summary of our post-mortem investigation is usually posted here a few days after the incident has ended. If you have additional questions about this incident, please log a ticket with us.