SUMMARY
Any email that could not be reprocessed was bounced back to the sender with instructions on how to resend. Unfortunately, due to the nature of this incident, we are unable to identify all impacted accounts nor provide a list of impacted emails.
We apologize for any inconvenience caused by this incident. And thank you for your patience while we worked through it.
17:28 UTC | 10:28 PT
Our engineering team has processed the majority of the backlog of emails from the incident starting on 4/23 and are currently working to handle any remaining emails that were not processed. We will provide an update on our progress later today.
23:29 UTC | 16:29 PT
Some batches of emails have been successfully processed. Our engineering team will continue to work on the remaining backlog. We will send another update on Monday morning.
23:08 UTC | 16:08 PT
Our engineers are working diligently to ensure we correctly route emails that were affected by this incident. The process is taking longer than expected. We appreciate your patience.
16:06 UTC | 09:06 PT
Inbound email processing has remained stable for new messages across all Pods. Our engineers are continuing to work through the email backlog from this incident.
02:26 UTC | 19:26 PT
DNS propagation has completed and all new emails should be received with minimal delay. We are now focusing on processing the email backlog. We will provide a final update once the email backlog is processed.
01:53 UTC | 18:53 PT
Our teams continue to see improvements with inbound mail delivery across all Pods as DNS changes propagate to global nameservers. Engineers are working on reprocessing the email backlog during this incident. Thanks for your patience while we continue to work through this incident
00:43 UTC | 17:43 PT
We're still seeing issues affecting inbound email via external address. We have made a mail configuration change and we are currently monitoring for improvements. We will provide an update as we know more.
00:07 UTC | 17:07 PT
We've identified an issue with inbound email processing for external email addresses on multiple pods. Our engineers are working to resolve the issue. More to come.
23:30 UTC | 16:30 PT
We are currently investigating issues with inbound email processing for external email addresses. More to follow.
Root Cause Analysis
This incident was caused by a DNS configuration change for the affected Pods that led to some external email servers rewriting email addresses on inbound emails. This led to emails reaching Zendesk mail servers, however we were unable to process these emails into tickets due to our mail processor being unable to identify the destination accounts.
Resolution
To fix this issue, our engineers rolled back the DNS changes and forced a DNS zone transfer through our DNS provider. The incident was resolved and new emails began to reach Zendesk Support accounts once DNS changes had propagated to global mail servers.
Further work was undertaken to reprocess the emails that did not create tickets during the incident. Approximately 25% remained unprocessable; for these emails, we processed a bounce-back notification to the original sender advising them to resubmit their original email.
Remediation Items
- [Scheduled] Improve deployment communication process with key internal stakeholders.
- [Scheduled] Evaluate rollout strategy for similar large scale network-related changes.
- [Scheduled] Investigate options to improve success rate of email backfills.
FOR MORE INFORMATION
For current system status information about your Zendesk, check out our system status page. During an incident, you can also receive status updates by following @ZendeskOps on Twitter. The summary of our post-mortem investigation is usually posted here a few days after the incident has ended. If you have additional questions about this incident, please log a ticket with us.
1 Comments
Post-mortem published April 30, 2020.
Article is closed for comments.