According to "Notifying external targets," an external target will be automatically deactivated after 21 consecutive attempts fail. For high-volume targets, this threshold can be reached very quickly during a very short disruption. (Imagine a target that is notified whenever a ticket is solved, for instance.) If something's down for even a minute, targets can wind up disabled.
When a target is automatically disabled, an email is sent to all admins, and that's that. (This is an improvement over only notifying the account owner -- thanks.) For the high-volume targets most likely to be disabled by this process, the time it takes to receive this email and act on it can result in a lot of lost data.
Here's the message:
>The target '[some target]' has been temporarily disabled due to too many failures. You can re-enable the target from Settings > Extensions > Targets to continue sending messages to the target, but please check the possible reason of these failures by testing the target first.
Please, consider these changes to make this situation easier to deal with. I've loosely ordered them by estimated complexity.
- Include a link directly to the Settings > Extensions > Targets page, instead of making me navigate there myself.
- Make it easier to integrate with incident response platforms by allowing me to specify an email address for these "target disabled" notifications.
- Include a log of the 21 failed attempts that triggered the deactivation.
- Keep a log of all attempts to notify the target, even beyond the initial 21, so they can be backfilled later.
- Attempt reactivating the target at increasing intervals. For long-running targets, the problem is usually temporary downtime, not any issue with the target's configuration.
- Allow me to increase the threshold from 21 attempts for particularly high-volume targets, or automatically scale this number based on how frequently the target is used.
- Retry failed attempts after the target is activated.
These are all important improvements to me, but if I could have only one -- let me export failed calls, including ones that are attempted when the target has been disabled, so I can backfill them later. Otherwise, give me greater control over the notifications, so I can fight the diffusion of responsibility that sets in with "all admins" notifications.