Fix Alertmanager inhibit rule for Gmail OAuth duplicate alerts #323

Open
opened 2026-05-02 14:50:42 +00:00 by forgejo_admin · 0 comments
Contributor

Type

Bug

Lineage

Standalone — discovered 2026-05-01 during alert-state audit. Related to forgejo_admin/pal-e-platform #290 (the PR that introduced the inhibit rule).

Repo

forgejo_admin/pal-e-platform

What Broke

The Alertmanager inhibit rule added in #290 fails to suppress the GmailOAuthTokenExpiringSoon warning when GmailOAuthTokenExpired critical is already firing. Both fire simultaneously, doubling the noise.

The rule:

inhibit_rules:
  - source_matchers: ["severity = critical"]
    target_matchers: ["severity = warning"]
    equal: ["alertname", "namespace"]

Equals on alertname, but the two alerts have different alertnames (GmailOAuthTokenExpired vs GmailOAuthTokenExpiringSoon), so the rule never matches them.

Repro Steps

  1. Wait for both GmailOAuthTokenExpired and GmailOAuthTokenExpiringSoon to fire (token age >7d).
  2. kubectl exec -n monitoring alertmanager-... -c alertmanager -- wget -qO- http://localhost:9093/api/v2/alerts → both alerts present, neither inhibited.

Expected Behavior

When GmailOAuthTokenExpired is firing, GmailOAuthTokenExpiringSoon is suppressed (not delivered to telegram receiver).

Environment

  • Cluster: pal-e, namespace monitoring
  • Helm release: kube-prometheus-stack
  • File: terraform/modules/monitoring/main.tf, inhibit_rules block

Acceptance Criteria

  • When the critical OAuth alert is firing, the warning OAuth alert is suppressed in Alertmanager
  • No regression to other inhibition pairs already working (OOMKilled etc.)
  • Solution choice documented in PR description (options: rename rules to share alertname with severity ramp; OR add secret-label inhibit rule; OR equal-on namespace only)
  • pal-e-platform — project
  • forgejo_admin/pal-e-platform #290 — origin of the buggy rule
  • alert-report-2026-05-01 — alert snapshot
### Type Bug ### Lineage Standalone — discovered 2026-05-01 during alert-state audit. Related to `forgejo_admin/pal-e-platform #290` (the PR that introduced the inhibit rule). ### Repo `forgejo_admin/pal-e-platform` ### What Broke The Alertmanager inhibit rule added in #290 fails to suppress the `GmailOAuthTokenExpiringSoon` warning when `GmailOAuthTokenExpired` critical is already firing. Both fire simultaneously, doubling the noise. The rule: ```yaml inhibit_rules: - source_matchers: ["severity = critical"] target_matchers: ["severity = warning"] equal: ["alertname", "namespace"] ``` Equals on `alertname`, but the two alerts have different alertnames (`GmailOAuthTokenExpired` vs `GmailOAuthTokenExpiringSoon`), so the rule never matches them. ### Repro Steps 1. Wait for both `GmailOAuthTokenExpired` and `GmailOAuthTokenExpiringSoon` to fire (token age >7d). 2. `kubectl exec -n monitoring alertmanager-... -c alertmanager -- wget -qO- http://localhost:9093/api/v2/alerts` → both alerts present, neither inhibited. ### Expected Behavior When `GmailOAuthTokenExpired` is firing, `GmailOAuthTokenExpiringSoon` is suppressed (not delivered to telegram receiver). ### Environment - Cluster: pal-e, namespace `monitoring` - Helm release: `kube-prometheus-stack` - File: `terraform/modules/monitoring/main.tf`, `inhibit_rules` block ### Acceptance Criteria - [ ] When the critical OAuth alert is firing, the warning OAuth alert is suppressed in Alertmanager - [ ] No regression to other inhibition pairs already working (`OOMKilled` etc.) - [ ] Solution choice documented in PR description (options: rename rules to share alertname with severity ramp; OR add `secret`-label inhibit rule; OR equal-on `namespace` only) ### Related - `pal-e-platform` — project - `forgejo_admin/pal-e-platform #290` — origin of the buggy rule - `alert-report-2026-05-01` — alert snapshot
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
ldraney/pal-e-platform#323
No description provided.