SUMMARY
On February 29, 2024 from 8:00 UTC to 16:10 UTC, Sell customers across all Pods experienced delays in the synchronization and storage of new emails.
Timeline
11:10 UTC | 03:10 PT
We are currently addressing a service disruption affecting the Sell Email delivery feature for customers across all Pods. This disruption may result in delayed synchronization of incoming emails. However, outbound emails should still be sent without issue at this time. As we work to update and synchronize emails, please note that this process may take a few hours. Our next update will be provided in 4 hours or sooner if we have more information to share. We appreciate your patience.
15:07 UTC | 07:07 PT
We have successfully removed the blockage affecting incoming emails for Sell customers across all Pods and are now processing the email queues. We will continue monitoring the situation until full resolution is achieved, which may take several hours to complete. An update will be shared once this process is finished.
17:34 UTC | 09:34 PT
Our teams have successfully processed remaining emails after removing the blockage affecting incoming emails for Sell customers across all Pods. Email is processing as expected at this time, and as such the incident is now resolved. Thank you for your patience during our investigation.
POST-MORTEM
Root Cause Analysis
This incident was caused by a key setting that was unexpectedly removed during an update to our email storage system. This setting is like a numbering machine that ensures every email gets a unique number for tracking. Without it, the system couldn't file away new emails properly. We had planned to update our system to handle more emails, but the update didn't go as planned, which caused the issue.
Resolution
To fix this issue, we had to restore the missing setting so that emails could be numbered correctly again. Our team looked at two different ways to make this fix, and chose the one that was more dependable. After starting the fix, it took about 4 hours and 44 minutes to get everything working again.
Remediation Items
- Improve our system checks to catch any similar issues faster in the future.
- Update our tools and set up extra warnings to help us stay on top of things.
- Implement limits on certain parts of our system to prevent overloading.
FOR MORE INFORMATION
For current system status information about your Zendesk, check out our system status page. The summary of our post-mortem investigation is usually posted here a few days after the incident has ended. If you have additional questions about this incident, please log a ticket with us via ZBot Messaging within the Widget.
1 comment
Jessica G.
Post-mortem published March 7, 2024.
0