Recent searches
No recent searches

Dan Beirouty
Joined Apr 14, 2021
·
Last activity Jan 07, 2025
Following
0
Followers
6
Total activity
219
Votes
15
Subscriptions
161
ACTIVITY OVERVIEW
BADGES
ARTICLES
POSTS
COMMUNITY COMMENTS
ARTICLE COMMENTS
ACTIVITY OVERVIEW
Latest activity by Dan Beirouty
Dan Beirouty commented,
Postmortem published January 7, 2025.
View comment · Posted Jan 07, 2025 · Dan Beirouty
0
Followers
0
Votes
0
Comments
Dan Beirouty commented,
Post-mortem published December 19, 2024.
View comment · Posted Dec 19, 2024 · Dan Beirouty
0
Followers
0
Votes
0
Comments
Dan Beirouty created an article,
SUMMARY
On December 16, 2024 from 1:16 UTC to 4:44 UTC, some Tymeshift and Workforce Management customers experienced errors and access issues.
TIMELINE
December 16, 2024 05:25 AM UTC | December 15, 2024 09:25 PM PT
We are happy to report that the Tymeshift and Workforce Management access issue is now resolved. Thanks for your patience while we worked through this issue today.
December 16, 2024 04:51 AM UTC | December 15, 2024 08:51 PM PT
We have identified the potential cause of the issue impacting Tymeshift / Workforce Management and deployed a fix. We are currently monitoring our systems for recovery. If you have a ticket with our support team, please reply to it reporting any improvements you may be seeing.
December 16, 2024 03:44 AM UTC | December 15, 2024 07:44 PM PT
We continue to investigate the access errors affecting Tymeshift and Workforce Management across multiple pods. We will provide the next update when we have new information to share. Thanks for your patience while we work through this issue.
December 16, 2024 03:01 AM UTC | December 15, 2024 07:01 PM PT
We have received reports of errors and access issues in Tymeshift and Workforce Management. Our team is looking into this issue at the highest priority. More information to come soon.
POST-MORTEM
Root Cause Analysis
The root cause of the incident was identified as a failure to properly close or deallocate prepared statements in an internal service. In specific cases, which are still under investigation, prepared statements accumulated to the point where the database reached its limit, causing it to stop responding.
Resolution
To resolve the incident, the team implemented a temporary workaround by scheduling daily redeployments of the affected service to prevent the issue from recurring until a permanent fix could be deployed. This approach allowed the system to regain functionality while a thorough investigation into the root cause was conducted.
Remediation Items
- Investigate Prepared Statements: Conduct a detailed investigation to determine why prepared statements were not being closed or deallocated properly and implement a fix.
- Implement Monitoring and Alerts: Develop and implement monitors and alerts to detect when the number of prepared statements approaches the limit.
- Review Error Monitor Thresholds: Review and adjust the thresholds for error monitoring to ensure timely detection of similar issues in the future.
- Prevent Recurrence: Schedule daily redeployments of the service until a permanent fix is implemented to prevent the issue from happening again.
- Increase Resource Allocation: Increase the CPU and memory allocation for the US1 Tymeapp TymeShift production instance to handle higher loads.
Preventive Measures
To prevent similar incidents in the future, we will:
- Enhance code reviews to ensure proper management of prepared statements.
- Implement robust monitoring systems that can detect and alert the team to potential issues before they lead to service outages.
- Conduct regular audits of database performance and resource utilization.
FOR MORE INFORMATION
For current system status information about Zendesk and specific impacts to your account, visit our system status page. You can follow this article to be notified when our post-mortem report is published. If you have additional questions about this incident, contact Zendesk customer support.
Edited Dec 20, 2024 · Dan Beirouty
0
Followers
7
Votes
1
Comment
Dan Beirouty created an article,
SUMMARY
On December 11, 2024 from 23:39 UTC to 6:30 UTC on December 12, customers using Zendesk AI features such as Advanced AI, Talk, AI Agents and other generative AI features experienced disruptions in functionality due to service provider outage.
TIMELINE
December 12, 2024 04:05 AM UTC | December 11, 2024 08:05 PM PT
We are observing recovery of all AI features and continue to monitor our systems for full recovery. We look forward to providing a final update when systems are fully stable.
December 12, 2024 01:53 AM UTC | December 11, 2024 05:53 PM PT
Our team has been working with our service provider on an issue impacting Zendesk AI features. The impact may be visible through Advanced AI, Talk, AI Agents and other generative AI features. Due to initial attempts failing to resolve the problem, teams continue to push forward at the highest priority to resolve this issue. We will pass on updates when they become available.
POST-MORTEM
Root Cause Analysis
The root cause of the incident was a new configuration for a telemetry service that unexpectedly generated a massive load on a service provider’s API across large clusters. This excessive load overwhelmed and disrupted DNS-based service discovery, leading to failed requests to our provider’s services.
Resolution
The incident was resolved once the service provider identified the issue and implemented corrective measures to alleviate the load on the API. Zendesk maintained communication with our service provider throughout the incident to ensure a coordinated response.
Remediation Items
- Support Level Agreement with LLM service teams: Work with internal customers to understand their performance and availability expectations, which will help in proposing fallback strategies and adjusting monitoring thresholds.
- Fallback Strategies for Generative AI Features: Develop fallback strategies for GenAI features, which will involve adding features to proxy systems and collaborating with feature owners to determine the best strategies for their respective cases.
- Premium Support from our service provider: Negotiate additional support from the service provider to ensure faster resolution and assistance during incidents.
Preventive Measures
To prevent similar incidents in the future, the following actions will be taken:
- Enhance monitoring and alerting systems to better detect abnormal loads on the API.
- Establish clearer communication channels and support agreements with our service provider to ensure rapid response during incidents.
- Implement fallback strategies for critical AI features to maintain service availability even during provider outages.
FOR MORE INFORMATION
For current system status information about Zendesk and specific impacts to your account, visit our system status page. You can follow this article to be notified when our post-mortem report is published. If you have additional questions about this incident, contact Zendesk customer support.
Edited Dec 20, 2024 · Dan Beirouty
0
Followers
3
Votes
1
Comment
Dan Beirouty commented,
Post-mortem published November 21. 2024.
View comment · Posted Nov 21, 2024 · Dan Beirouty
0
Followers
0
Votes
0
Comments
Dan Beirouty commented,
Post-mortem published November 18, 2024.
View comment · Posted Nov 18, 2024 · Dan Beirouty
0
Followers
0
Votes
0
Comments
Dan Beirouty commented,
Post-mortem published November 14, 2024.
View comment · Posted Nov 14, 2024 · Dan Beirouty
0
Followers
0
Votes
0
Comments
Dan Beirouty commented,
Post-mortem published November 1, 2024.
View comment · Posted Nov 01, 2024 · Dan Beirouty
0
Followers
0
Votes
0
Comments
Dan Beirouty created an article,
Zendesk will perform critical maintenance which will impact performance for customers using Zendesk Twitter integrations on all Pods on Thursday, November 14, 2024, during the times listed below.
Affected products: Twitter integration and Sunshine Conversations Twitter DM channel
Date |
Start Time |
End Time |
|
November 14, 2024 |
All |
21:00 UTC / |
05:00 UTC (Nov 15) / |
Expected behavior
The following features will be unavailable during the maintenance window:
- Twitter Direct Messages
- Twitter Posts and Comments
- Admins cannot add or manage existing Twitter Accounts.
- Trigger configured with Twitter Target action will not post messages to Twitter.
Comments, posts and direct messages from Twitter will not be received during the maintenance period, nor will admins be able to update or add Twitter accounts in Zendesk.
Once the maintenance window is finished, Zendesk will fetch all posts and messages during the maintenance window and bring them into the accounts.
The lowest traffic window for customers has been chosen for this exercise and cannot be changed or specified. We appreciate your understanding.
Why we're doing this: The Zendesk Integrations team is making changes to improve security for the Twitter integration.
Edited Oct 31, 2024 · Dan Beirouty
0
Followers
3
Votes
0
Comments
Dan Beirouty commented,
Post-mortem published October 25, 2024.
View comment · Posted Oct 25, 2024 · Dan Beirouty
0
Followers
0
Votes
0
Comments