Disable default kube-prometheus-stack alerting rules #407

Closed
opened 2026-06-04 12:38:25 +00:00 by ldraney · 0 comments
Owner

Type

Feature

Lineage

Related to ldraney/landscaping-assistant #95 — AlertManager cleanup tracked there, platform work here.

Repo

ldraney/pal-e-platform

User Story

As the platform operator
I want to disable the ~95 default kube-prometheus-stack alerting rules
So that AlertManager only fires for actionable custom alerts

Context

Audit on 2026-06-04 found 123 alert rules across 18 groups. ~95 are kube-prometheus-stack defaults designed for multi-team Kubernetes operations (alertmanager internals, kubelet health, 26 node-exporter alerts, 23 Prometheus self-monitoring alerts, API server SLOs). These never fire for real issues and dilute attention from the ~28 custom rules that do.

Recording rules must be preserved — Grafana dashboards depend on them.

File Targets

Files to modify:

  • terraform/modules/monitoring/main.tf — set all alerting rule groups to false in defaultRules.rules, add Watchdog to custom platform-alerts

Files NOT to touch:

  • Custom PrometheusRule Terraform resources (blackbox_alerts, embedding_alerts, payment_pipeline_alerts, gmail_oauth_expiry_alert)

Acceptance Criteria

  • All default alerting rule groups disabled
  • Recording rules preserved (dashboards still work)
  • Watchdog heartbeat alert added to custom platform-alerts
  • Custom PrometheusRules unaffected
  • Alert count drops from ~123 to ~30

Test Expectations

  • terraform plan shows rule changes only
  • After apply: kubectl get prometheusrules -n monitoring shows reduced count
  • Watchdog still fires in AlertManager
  • Grafana dashboards still load

Constraints

  • Keep defaultRules.create = true to preserve recording rules
  • Disable alerting via individual defaultRules.rules.* toggles

Checklist

  • PR opened
  • Tests pass
  • No unrelated changes
  • ldraney/landscaping-assistant #95 — parent tracking issue
  • ldraney/landscaping-assistant #17 — follow-up: app-specific PrometheusRule alerts
### Type Feature ### Lineage Related to `ldraney/landscaping-assistant #95` — AlertManager cleanup tracked there, platform work here. ### Repo `ldraney/pal-e-platform` ### User Story As the platform operator I want to disable the ~95 default kube-prometheus-stack alerting rules So that AlertManager only fires for actionable custom alerts ### Context Audit on 2026-06-04 found 123 alert rules across 18 groups. ~95 are kube-prometheus-stack defaults designed for multi-team Kubernetes operations (alertmanager internals, kubelet health, 26 node-exporter alerts, 23 Prometheus self-monitoring alerts, API server SLOs). These never fire for real issues and dilute attention from the ~28 custom rules that do. Recording rules must be preserved — Grafana dashboards depend on them. ### File Targets Files to modify: - `terraform/modules/monitoring/main.tf` — set all alerting rule groups to false in defaultRules.rules, add Watchdog to custom platform-alerts Files NOT to touch: - Custom PrometheusRule Terraform resources (blackbox_alerts, embedding_alerts, payment_pipeline_alerts, gmail_oauth_expiry_alert) ### Acceptance Criteria - [ ] All default alerting rule groups disabled - [ ] Recording rules preserved (dashboards still work) - [ ] Watchdog heartbeat alert added to custom platform-alerts - [ ] Custom PrometheusRules unaffected - [ ] Alert count drops from ~123 to ~30 ### Test Expectations - [ ] `terraform plan` shows rule changes only - [ ] After apply: `kubectl get prometheusrules -n monitoring` shows reduced count - [ ] Watchdog still fires in AlertManager - [ ] Grafana dashboards still load ### Constraints - Keep `defaultRules.create = true` to preserve recording rules - Disable alerting via individual `defaultRules.rules.*` toggles ### Checklist - [ ] PR opened - [ ] Tests pass - [ ] No unrelated changes ### Related - `ldraney/landscaping-assistant #95` — parent tracking issue - `ldraney/landscaping-assistant #17` — follow-up: app-specific PrometheusRule alerts
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
ldraney/pal-e-platform#407
No description provided.