Summary
On April 24, 2022 at 07:00 UTC, Zendesk performed a planned maintenance activity to upgrade Zendesk Chat infrastructure. This maintenance was scheduled to be completed by 08:00 UTC, but due to complications encountered partway through, it took additional time to complete. As a result, this maintenance continued until 08:11 UTC for the majority of customers, with all customers seeing resolution by 08:25 UTC. During this period, customers of both Chat and Agent Workspace messaging were unable to use the application.
The maintenance activity involved applying security patches, upgrading database instances and implementing capacity management improvements to our Chat product. For context, our initial estimation of the duration of the activity was based on preparatory dry runs performed in advance, but we encountered some connectivity issues and unexpected errors during the execution, causing our maintenance process to fall behind the benchmarked speed.
Timeline
07:51 UTC
We began bringing the instances online. During this process, we observed that the capacity rebalancing of some Live Chat servers was taking more time than expected.
08:11 UTC
We were able to bring the remaining instances online. At this point, the service was operational for the majority of our customers.
08:25 UTC
We were able to complete the activity. At this point, service was restored for the remaining customers.
Remediation Items
- Increase the coverage and accuracy of our maintenance activity tests in our staging environment [In Progress]
- Modularise the Live Chat capacity rebalancing script to separate out validation and migration activities [In Progress]
- Build in more buffer time at the start of maintenance windows to ensure all maintenance work starts on time [In Progress]
Original Scheduled Maintenance Notification
Zendesk will perform critical maintenance which will impact performance for Chat and Agent Workspace customers on April 24, 2022 UTC, during the times listed below.
Date |
POD |
Start Time |
End Time |
April 24, 2022 UTC April 24, 2022 PDT |
All All |
07:00 UTC 00:00 PDT |
08:00 UTC 01:00 PDT |
Customers Affected: Zendesk Chat and Agent Workspace customers on all Pods will experience a maximum service disruption of 45 minutes at some time within the 1 hour maintenance window.
Affected products: Chat, Agent Workspace, Social Messaging, Web/Mobile Messaging and SunCo SDK
Expected behavior:
During the maintenance window
- No new chat sessions can be created. Ongoing chats will be disconnected for both agents and visitors
- Chat dashboard will be accessible but most of the functionality will be unavailable
- Agents will not be able to respond to the messaging tickets during the maintenance window
- Customers using Social Messaging , Web/Mobile Messaging and SunCo SDK with Agent Workspace will have a degraded experience
After the maintenance window
- All Chat, Agent Workspace and Customers using Social Messaging, Web/Mobile Messaging and SunCo SDK functionality will be restored to normal.
What will happen to our Chat Widget/Mobile SDK/Web SDK during this maintenance?
- If you are currently using the Web Widget (Classic), and other channels outside of Chat are enabled for use (contact form, help center, Answer Bot, etc.), the widget will remain visible on your site through the maintenance window serving those functions besides Chat. Once the maintenance window is over and agents are logged into Zendesk Chat, the Web Widget (Classic) will show the Chat channel as available again.
- If you are using an older version of the legacy Chat Standalone Widget - the widget will not load and be considered offline.
- Chat Mobile SDK and Web SDK will appear as offline and will fail to connect
Why we're doing this:
- The core Chat servers will be restarted after several updates to improve the security and reliability of the Chat service and to maintain compliance.
- If any issues are identified while patching the servers, the process will halt and Chat services will be brought back online. We will then work to schedule a new maintenance window to continue as needed.
2 Comments
This article was initially published on Mar 25, 2022, 8:12:09 AM UTC.
Updates were added to the article body for better visibility.
Thank you!
Post-mortem published April 29, 2022.
Article is closed for comments.