SUMMARY
On December 16 2021 from 02:04 UTC to 16:41 UTC, customers on Pod 23 were having issues logging onto Zendesk Support using Microsoft SSO.
Timeline
14:33 UTC | 06:33 PT
We are currently investigating reports of issues with Microsoft SSO login for some customers. The workaround is to use the your_subdomain.zendesk.com/access/sso_bypass link in order to regain immediate access. We will inform further details as soon as we have them. Thanks for your patience!
14:54 UTC | 06:54 PT
After further investigation, the issue with Microsoft SSO login seems to be affecting only customers on POD 23 at the moment. We have rolled an internal change and we’d like to ask you to test login in via SSO again, please.
15:50 UTC | 07:50 PT
Unfortunately, it seems that the last change doesn’t seem to have fixed the issue. The team continues to investigate the possible causes for this and we will provide more updates within one hour or when we have more details.
16:11 UTC | 08:11 PT
We would also like to inform that the scope of this incident regarding the issue with Microsoft SSO login has changed and should be affecting customers across multiple Pods. Investigation continues. We appreciate your patience and understanding.
17:05 UTC | 09:05 PT
The issue impacting Microsoft SSO login is now fully resolved. Please let us know if you continue to experience issues.
POST-MORTEM
Root Cause Analysis
This incident was caused by a sequence of events:
- The Microsoft Office365 experienced an outage for a period of time. When Zendesk fetched the Office365 keys then, an empty response was returned.
- This empty response was cached as the authentication keys token
- The empty token then resulted in errors when decoding Microsoft SSO responses
- Customers then entered a redirect loop when Zendesk redirected them back to the start to try logging in again
Resolution
To fix this issue, we deleted the cached values to regenerate the cache with valid keys. Recovery was observed thereafter.
Remediation Items
- Build in logic to not cache empty values when receiving non-successful responses from the Microsoft service for fetching authentication keys [Scheduled]
- Improve logging mechanisms to capture errors from this scenario [Scheduled]
FOR MORE INFORMATION
For current system status information about your Zendesk, check out our system status page. During an incident, you can also receive status updates by following @ZendeskOps on Twitter. The summary of our post-mortem investigation is usually posted here a few days after the incident has ended. If you have additional questions about this incident, please log a ticket with us.