Configuring Throttling and Repetition
So what do all of these options look like in practice?
route: receiver: 'team-pager' group_by: ['job', 'alertname'] group_wait: 30s group_interval: 5m repeat_interval: 4h
All of these settings are per-route, and inherited by default from their parent route.
The defaults are to group everything together, a group_wait of 30 seconds, group_interval of 5 minutes and repeat_interval of 4 hours.
There's one other setting I'd like to discuss at this point, as it tends to cause confusion. This is resolve_timeout in the global section. Users sometimes get the mistaken impression that this affects grouping logic. To put it simply, if you are only using Prometheus to send alerts to the Alertmanager then this setting has no effect. Internally alerts sent to the Alertmanager have an end time, after which the alert is no longer considered firing. This is to handle the case where a Prometheus were to die and not continue to send alerts. Prometheus always sets this end time, however other Alertmanager clients may not. The resolve_timeout is a default for when the end time is missing.