SUMMARY
From March 27, 2024 2:10 UTC to 4:52 UTC, Sunshine Conversations customers across all Pods using Social Messaging channels may have experienced a significant disruption in service due to an outage at Meta. This outage primarily affected WhatsApp, with substantial impacts also on Facebook Messenger and Instagram, leading to a partial interruption in outbound traffic through these channels.
Timeline
03:34 UTC | 20:34 PT
We have received reports of WhatsApp message delivery failures resulting in messaging and ticket creation issues for customers in Agent Workspace (Support). Our service provider is investigating the issue. An update will be provided when we have more to share. Thank you.
04:22 UTC | 21:22 PT
We are observing recovery following our service provider resolving the WhatsApp message delivery issues. Customers should now be able to resend any messages that failed during the incident. We will provide a final update as soon as possible.
04:45 UTC | 21:45 PT
We are happy to report that the WhatsApp message delivery issue is resolved. As previously mentioned, customers should resend any messages that failed during the incident. Thanks for your patience during today's disruption.
POST-MORTEM
Root Cause Analysis
This incident was caused by an issue with Meta's services, specifically impacting the WhatsApp Cloud API. The exact root cause was not disclosed by Meta, but they acknowledged the disruption and worked on restoring services. During this time, Sunshine Conversations' monitoring systems detected a high occurrence of errors when attempting to send messages through the affected channels.
Resolution
The resolution of the outage was entirely dependent on Meta's recovery efforts. Sunshine Conversations monitored the situation closely and provided updates to customers as they became available. Once Meta announced that services were restored, Sunshine Conversations verified that outbound traffic resumed to normal levels and that messages were being successfully delivered through the social messaging channels.
Remediation Items
- Improve monitoring for outbound traffic health.
- Review and update the incident response plan to include alternative communication channels with Meta in case of their primary support portal being unavailable.
FOR MORE INFORMATION
For current system status information about your Zendesk, check out our system status page. The summary of our post-mortem investigation is usually posted here a few days after the incident has ended. If you have additional questions about this incident, please log a ticket with us via ZBot Messaging within the Widget.
1 comment
Jeremy R.
Post-Mortem published April 11, 2024
0