Recent searches
No recent searches

Emanuele Sparacca
Joined Oct 27, 2021
·
Last activity Feb 07, 2025
Following
0
Followers
0
Total activity
7
Votes
0
Subscriptions
4
ACTIVITY OVERVIEW
BADGES
ARTICLES
POSTS
COMMUNITY COMMENTS
ARTICLE COMMENTS
ACTIVITY OVERVIEW
Latest activity by Emanuele Sparacca
Emanuele Sparacca created an article,
SUMMARY
February 07, 2025 11:12 AM UTC | February 07, 2025 03:12 AM PT
We are pleased to inform you that the issue with the Explore dashboard has been resolved as of 10:25 UTC. Thank you for your patience and understanding!
February 07, 2025 10:54 AM UTC | February 07, 2025 02:54 AM PT
We are currently experiencing delays on the Explore dashboard since yesterday at 20:00 UTC. Our engineering team has identified the issue and applied a fix. We are actively monitoring the situation to ensure a smooth experience. Thank you for your patience!
POST-MORTEM
TBD
FOR MORE INFORMATION
For current system status information about Zendesk and specific impacts to your account, visit our system status page. You can follow this article to be notified when our post-mortem report is published. If you have additional questions about this incident, contact Zendesk customer support.
Edited Feb 07, 2025 · Emanuele Sparacca
0
Followers
1
Vote
0
Comments
Emanuele Sparacca created an article,
SUMMARY
On January 16th, 2025 from 9:40 UTC to 10:47 UTC some Chat customers on Pod 19 experienced issues viewing recent chats, receiving chat export emails, and creating tickets from chats.
TIMELINE
January 16, 2025 11:26 AM UTC | January 16, 2025 03:26 AM PT
We are pleased to inform you that the issues affecting our Chat service for our customers on POD19 have now been resolved. We sincerely appreciate your patience and understanding during this time.
January 16, 2025 11:00 AM UTC | January 16, 2025 03:00 AM PT
We have made significant progress in recovering functionality, including the ability to view recent chats, receive chat export emails, and create tickets. We will continue to monitor the situation closely and work diligently to enhance your experience. Thank you for your patience and understanding during this time.
January 16, 2025 10:39 AM UTC | January 16, 2025 02:39 AM PT
We are currently experiencing an issue with our chat services on Pod 19, which may prevent you from viewing recent chats, receiving chat export emails, and creating tickets. Our team is actively working to resolve these problems as quickly as possible. Thank you for your patience.
POST-MORTEM
Root Cause Analysis
This incident was caused by a chat service reaching its memory limits, which led to a continuous restart cycle. Each restart generated additional metadata in our in-memory database, causing memory bloat until the system eventually ran out of memory, impacting other services that shared the same database instance.
Resolution
To resolve the issue, the team removed unnecessary metadata and unacknowledged keys from the database to free up memory. Additionally, the instance types were increased to accommodate the load, and a successful deployment of the service was completed.
Remediation Items
- Add Alerts: Implemented alerts for Out of Memory (OOM) conditions in the chat service.
- Adjust Memory Limits: Lowered the threshold for memory limits to allow for earlier intervention before reaching critical levels.
- Runbook Improvements: Enhanced documentation and runbooks for handling the chat service and database key management.
- Database Clustering: Planned to separate the database instances for different services to avoid shared memory issues in the future.
FOR MORE INFORMATION
For current system status information about Zendesk and specific impacts to your account, visit our system status page. You can follow this article to be notified when our post-mortem report is published. If you have additional questions about this incident, contact Zendesk customer support.
Edited Jan 29, 2025 · Emanuele Sparacca
0
Followers
2
Votes
1
Comment
Emanuele Sparacca created an article,
SUMMARY
On December 1, 2024 from 4:00 UTC to December 3, 20:00 UTC, Sell customers in multiple pods experienced issues with features including data visibility in Smart Lists, lead conversion with deal creation, and outbound calls, with the latter experiencing intermittent failures. Once functionality was restored, a backlog of requests had to be processed, which took until December 18, 2024 at 16:22 UTC to complete.
TIMELINE
December 18, 2024 04:22 PM UTC | December 18, 2024 08:22 AM PT
Thank you for your patience while we reprocessed Sell data that was missed or affected during the window of impact. At this time all data should be correct. Please reach out if you continue to see any issues.
December 13, 2024 11:26 PM UTC | December 13, 2024 03:26 PM PT
Our engineering team has made significant progress to backfill and reprocess Sell data that was missed or affected during the window of impact; however, a small subset of requests requiring more manual involvement to backfill still remains. We are spending additional time and effort to ensure that all data reaches the appropriate location, and will continue our work next week to confirm full recovery. Thank you for your continued patience in the meantime.
December 09, 2024 10:16 PM UTC | December 06, 2024 02:16 PM PT
Our team continues to work to backfill the Sell data affected during the window of impact; however, given the volume and our level of care and diligence in ensuring the correct data is included accurately, this will take some additional time to complete. We will be sure to provide further updates as the backfill progresses.
December 06, 2024 02:06 PM UTC | December 06, 2024 06:06 AM PT
We would like to provide an update regarding the incident impacting our Sell customers on December the 3rd, 2024. Our team continues to work trough the data backlog that occurred during the incident. We will continue to provide updates as soon as possible.
December 04, 2024 10:27 AM UTC | December 04, 2024 02:27 AM PT
Our team is actively exploring the most effective approach to the backlog of actions resulting from yesterday's incident affecting Sell. We will share additional updates as soon as they become available.
December 03, 2024 11:44 PM UTC | December 03, 2024 03:44 PM PT
Our engineering team has stabilized Sell functionality, and new requests are being processed as expected at this time. We are working through our options to process requests that may have timed out during the window of impact and will provide further information when this investigation continues tomorrow.
December 03, 2024 09:47 PM UTC | December 03, 2024 01:47 PM PT
Our team continues to work to reduce the backlog and restore expected Sell functionality. We are working to increase capacity to speed up recovery, but some latency and delays are still expected. We will provide further updates when we have new information to share.
December 03, 2024 05:09 PM UTC | December 03, 2024 09:09 AM PT
We are beginning to see some improvement in the issues affecting Sell; however, there is a significant backlog we are working to address, and some latency may still be experienced. We will continue to monitor the situation to ensure full recovery.
December 03, 2024 03:35 PM UTC | December 03, 2024 07:35 AM PT
Our team continues to work on the issues currently impacting Sell. These can manifest as issues with data visibility in Smart Lists, lead conversion with deal creation, and intermittent outbound call failures. We will provide any further updates as they are available.
December 03, 2024 02:01 PM UTC | December 03, 2024 06:01 AM PT
We want to keep you informed regarding the ongoing issue affecting certain features, including data visibility in Smart Lists, lead conversion with deal creation, and intermittent outbound call failures. While we don’t have new developments to share at this time, please know that our team is working diligently to resolve the matter as quickly as possible.
December 03, 2024 12:14 PM UTC | December 03, 2024 04:14 AM PT
Our team is actively addressing the service degradation affecting specific features. Currently, data visibility in Smart Lists, lead conversion with deal creation, and outbound calls are impacted, with the latter experiencing intermittent failures. While most core services remain operational, some issues can often be resolved by reloading or retrying.
December 03, 2024 11:23 AM UTC | December 03, 2024 03:23 AM PT
Our team is actively addressing the service degradation impacting specific features, including data visibility in Smart Lists and lead conversion with deal creation. Most core services remain operational, and issues with some functionalities can often be resolved by reloading or retrying.
December 03, 2024 10:53 AM UTC | December 03, 2024 02:53 AM PT
We are currently investigating an issue where stale data may be appearing in our systems. Additionally, attempts to update data during this time may result in errors. Our team is working diligently to resolve these issues as a priority.
POST-MORTEM
Root Cause Analysis
This incident was caused by a sudden increase in request volume that led to high memory usage across Sell infrastructure. This resulted in alerts due to excessive load, and caused multiple queues to fill up to their maximum capacity. The system responsible for managing these request flows was restarting frequently and could not keep up with the demand, leading to a growing backlog and preventing new requests from processing.
Resolution
To address the issue, we first attempted to scale up additional infrastructure, but this also quickly filled up to capacity. We then set up a new cluster with additional resources to effectively manage live traffic. This allowed us to stabilize operations and restore normal functionality while we worked on clearing the backlog of requests in the old infrastructure.
Remediation Items
- Remove Outdated Notification Queues: We decided to eliminate unnecessary notification queues that were not needed for customer communication. This reduces the number of requests processed by the relevant infrastructure.
- Enhance Message Processing Tools: Improvements were made to existing tools to increase efficiency in handling messages, again providing more capacity to process requests.
- Establish Additional Alerts: New monitoring alerts were created to keep track of system performance and prevent high memory usage.
- Set Connection Limits: We implemented limits on the number of connections to specific applications to prevent overload and ensure smoother traffic management.
FOR MORE INFORMATION
For current system status information about your Zendesk, check out our system status page. The summary of our post-mortem investigation is usually posted here a few days after the incident has ended. If you have additional questions about this incident, contact Zendesk customer support.
Edited Jan 07, 2025 · Emanuele Sparacca
0
Followers
3
Votes
1
Comment