Add landscaping-assistant alerts and dedicated Telegram channel #410

Merged
ldraney merged 3 commits from 409-landscaping-alerts-telegram into main 2026-06-05 04:02:56 +00:00
Owner

Summary

Route landscaping-assistant alerts to a dedicated Telegram group with 4 app-specific PrometheusRule alerts.

Changes

  • terraform/modules/monitoring/main.tf: Add telegram-landscaping receiver, sub-route for namespace=landscaping-assistant, PrometheusRule with 4 alerts (ErrorRateHigh, LatencyHigh, PumaSaturated, AppDown), and set_sensitive blocks for the new receiver
  • terraform/modules/monitoring/variables.tf: Add telegram_landscaping_chat_id variable
  • terraform/main.tf: Pass new variable to monitoring module
  • terraform/variables.tf: Declare telegram_landscaping_chat_id at root level

Test Plan

  • terraform plan shows new PrometheusRule + updated AlertManager config
  • After apply: 4 new alert rules visible in Prometheus UI
  • Test alert routes to dedicated Telegram group (not general channel)
  • Existing alerts continue routing to general Telegram group

Post-apply

Add telegram_landscaping_chat_id = "-1003862285795" to terraform/secrets.auto.tfvars before running terraform apply.

Review Checklist

  • No secrets in diff (chat_id passed via variable, set_sensitive)
  • Existing receivers/routes unchanged
  • Recording rules unaffected
  • PrometheusRule labels match kube-prometheus-stack selector
  • landscaping-assistant#95 (alert cleanup parent)
  • landscaping-assistant#17 (PrometheusRule alerts)

Closes #409

## Summary Route landscaping-assistant alerts to a dedicated Telegram group with 4 app-specific PrometheusRule alerts. ## Changes - `terraform/modules/monitoring/main.tf`: Add `telegram-landscaping` receiver, sub-route for `namespace=landscaping-assistant`, PrometheusRule with 4 alerts (ErrorRateHigh, LatencyHigh, PumaSaturated, AppDown), and `set_sensitive` blocks for the new receiver - `terraform/modules/monitoring/variables.tf`: Add `telegram_landscaping_chat_id` variable - `terraform/main.tf`: Pass new variable to monitoring module - `terraform/variables.tf`: Declare `telegram_landscaping_chat_id` at root level ## Test Plan - [ ] `terraform plan` shows new PrometheusRule + updated AlertManager config - [ ] After apply: 4 new alert rules visible in Prometheus UI - [ ] Test alert routes to dedicated Telegram group (not general channel) - [ ] Existing alerts continue routing to general Telegram group ## Post-apply Add `telegram_landscaping_chat_id = "-1003862285795"` to `terraform/secrets.auto.tfvars` before running `terraform apply`. ## Review Checklist - [x] No secrets in diff (chat_id passed via variable, set_sensitive) - [x] Existing receivers/routes unchanged - [x] Recording rules unaffected - [x] PrometheusRule labels match kube-prometheus-stack selector ## Related Notes - landscaping-assistant#95 (alert cleanup parent) - landscaping-assistant#17 (PrometheusRule alerts) Closes #409
Add landscaping-assistant alerts and dedicated Telegram channel
Some checks failed
ci/woodpecker/push/terraform Pipeline was successful
ci/woodpecker/pr/terraform Pipeline failed
4722576bd5
Route landscaping-assistant namespace alerts to a dedicated Telegram
group for clean per-app signal. Adds 4 PrometheusRule alerts: error
rate, p95 latency, Puma thread saturation, and blackbox probe down.

Closes #409

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Author
Owner

PR #410 Review

DOMAIN REVIEW

Stack: Terraform / Helm (kube-prometheus-stack) / Kubernetes (PrometheusRule CRD)

Receiver indexing: Verified against main branch. The current receivers array is [0]=default, [1]=telegram. The PR appends [2]=telegram-landscaping. The set_sensitive blocks correctly target receivers[2].telegram_configs[0].bot_token and receivers[2].telegram_configs[0].chat_id. This is position-dependent and brittle by nature (any future reordering breaks it), but it matches the existing pattern for receivers[1] -- no new risk introduced.

Variable threading: telegram_landscaping_chat_id is declared at root terraform/variables.tf (sensitive=true), passed through terraform/main.tf to the module, and declared in terraform/modules/monitoring/variables.tf (sensitive=true). Complete chain, no gaps.

PrometheusRule labels: release = "kube-prometheus-stack" and app.kubernetes.io/part-of = "kube-prometheus-stack" match the existing payment-pipeline-alerts pattern. The rule will be picked up by the Prometheus operator.

Sub-route: matchers = ["namespace = landscaping-assistant"] with continue = false correctly short-circuits to the dedicated receiver. All other alerts fall through to the default telegram receiver.

Bot token reuse: The landscaping receiver reuses var.telegram_bot_token (same bot, different chat_id). This is correct Telegram bot architecture.

PromQL expressions: All four are syntactically valid. LandscapingErrorRateHigh uses > 0 threshold (any 5xx is critical) -- intentional and appropriate for a low-traffic app. LandscapingLatencyHigh correctly excludes health check endpoints via controller!="rails/health". LandscapingAppDown depends on probe_success{service="landscaping-assistant"} which was added in PR #400.

Secrets: No plaintext secrets in the diff. The chat_id value appears in the PR body as guidance for secrets.auto.tfvars, but that file is gitignored and not committed.

BLOCKERS

None.

NITS

  1. tofu fmt alignment drift in terraform/main.tf: The diff shows the new line using wider padding than the surrounding block:
  telegram_bot_token             = var.telegram_bot_token
  telegram_chat_id               = var.telegram_chat_id
  telegram_landscaping_chat_id   = var.telegram_landscaping_chat_id
  tailscale_domain           = var.tailscale_domain

The first two lines were re-padded to align with the new longest key, but tailscale_domain and everything below were not. Run tofu fmt to normalize the entire block.

  1. LandscapingPumaSaturated annotation displays ratio, not percentage: The expression (busy / max) > 0.8 yields a value like 0.85, but the annotation says {{ $value | printf "%.0f" }}% of Puma threads are busy. At threshold, this renders as "1% of Puma threads are busy" instead of "85%". Either multiply by 100 in the expression (* 100 > 80) or fix the template to {{ $value | humanizePercentage }} / {{ $value | printf "%.0f%%" }} after multiplying.

  2. No test coverage note: This is IaC (Terraform + Prometheus rules), so the "new functionality must have tests" blocker does not apply in the traditional sense. The Test Plan section covers terraform plan validation and manual alert verification, which is the appropriate testing strategy for this domain.

SOP COMPLIANCE

  • Branch named after issue: 409-landscaping-alerts-telegram matches issue #409
  • PR body follows template: Summary, Changes, Test Plan, Related all present
  • Related references cross-repo issues: landscaping-assistant#95, landscaping-assistant#17
  • No secrets committed
  • No scope creep: all 4 files are monitoring-related changes for the stated purpose
  • Closes #409 links the parent issue

PROCESS OBSERVATIONS

  • This is the fourth landscaping-assistant observability PR in a sequence (after #400 blackbox, #402/#403 dashboard, now #409/#410 alerts). The incremental approach is good for change failure risk -- each PR is independently reversible.
  • The positional set_sensitive indexing pattern (receivers[N]) is a known fragility across the entire monitoring module. If a future PR reorders or removes a receiver, all downstream indices break silently. Consider documenting the index mapping as a comment near the set_sensitive blocks. Not a blocker for this PR since it follows the established pattern.

VERDICT: APPROVED

## PR #410 Review ### DOMAIN REVIEW **Stack**: Terraform / Helm (kube-prometheus-stack) / Kubernetes (PrometheusRule CRD) **Receiver indexing**: Verified against `main` branch. The current `receivers` array is `[0]=default, [1]=telegram`. The PR appends `[2]=telegram-landscaping`. The `set_sensitive` blocks correctly target `receivers[2].telegram_configs[0].bot_token` and `receivers[2].telegram_configs[0].chat_id`. This is position-dependent and brittle by nature (any future reordering breaks it), but it matches the existing pattern for `receivers[1]` -- no new risk introduced. **Variable threading**: `telegram_landscaping_chat_id` is declared at root `terraform/variables.tf` (sensitive=true), passed through `terraform/main.tf` to the module, and declared in `terraform/modules/monitoring/variables.tf` (sensitive=true). Complete chain, no gaps. **PrometheusRule labels**: `release = "kube-prometheus-stack"` and `app.kubernetes.io/part-of = "kube-prometheus-stack"` match the existing `payment-pipeline-alerts` pattern. The rule will be picked up by the Prometheus operator. **Sub-route**: `matchers = ["namespace = landscaping-assistant"]` with `continue = false` correctly short-circuits to the dedicated receiver. All other alerts fall through to the default `telegram` receiver. **Bot token reuse**: The landscaping receiver reuses `var.telegram_bot_token` (same bot, different chat_id). This is correct Telegram bot architecture. **PromQL expressions**: All four are syntactically valid. `LandscapingErrorRateHigh` uses `> 0` threshold (any 5xx is critical) -- intentional and appropriate for a low-traffic app. `LandscapingLatencyHigh` correctly excludes health check endpoints via `controller!="rails/health"`. `LandscapingAppDown` depends on `probe_success{service="landscaping-assistant"}` which was added in PR #400. **Secrets**: No plaintext secrets in the diff. The chat_id value appears in the PR body as guidance for `secrets.auto.tfvars`, but that file is gitignored and not committed. ### BLOCKERS None. ### NITS 1. **`tofu fmt` alignment drift in `terraform/main.tf`**: The diff shows the new line using wider padding than the surrounding block: ``` telegram_bot_token = var.telegram_bot_token telegram_chat_id = var.telegram_chat_id telegram_landscaping_chat_id = var.telegram_landscaping_chat_id tailscale_domain = var.tailscale_domain ``` The first two lines were re-padded to align with the new longest key, but `tailscale_domain` and everything below were not. Run `tofu fmt` to normalize the entire block. 2. **`LandscapingPumaSaturated` annotation displays ratio, not percentage**: The expression `(busy / max) > 0.8` yields a value like `0.85`, but the annotation says `{{ $value | printf "%.0f" }}% of Puma threads are busy`. At threshold, this renders as "1% of Puma threads are busy" instead of "85%". Either multiply by 100 in the expression (`* 100 > 80`) or fix the template to `{{ $value | humanizePercentage }}` / `{{ $value | printf "%.0f%%" }}` after multiplying. 3. **No test coverage note**: This is IaC (Terraform + Prometheus rules), so the "new functionality must have tests" blocker does not apply in the traditional sense. The Test Plan section covers `terraform plan` validation and manual alert verification, which is the appropriate testing strategy for this domain. ### SOP COMPLIANCE - [x] Branch named after issue: `409-landscaping-alerts-telegram` matches issue #409 - [x] PR body follows template: Summary, Changes, Test Plan, Related all present - [x] Related references cross-repo issues: `landscaping-assistant#95`, `landscaping-assistant#17` - [x] No secrets committed - [x] No scope creep: all 4 files are monitoring-related changes for the stated purpose - [x] `Closes #409` links the parent issue ### PROCESS OBSERVATIONS - This is the fourth landscaping-assistant observability PR in a sequence (after #400 blackbox, #402/#403 dashboard, now #409/#410 alerts). The incremental approach is good for change failure risk -- each PR is independently reversible. - The positional `set_sensitive` indexing pattern (`receivers[N]`) is a known fragility across the entire monitoring module. If a future PR reorders or removes a receiver, all downstream indices break silently. Consider documenting the index mapping as a comment near the `set_sensitive` blocks. Not a blocker for this PR since it follows the established pattern. ### VERDICT: APPROVED
Fix review nits: Puma annotation math and alignment
Some checks failed
ci/woodpecker/push/terraform Pipeline was successful
ci/woodpecker/pr/terraform Pipeline failed
b111d47100
- Use humanizePercentage for PumaSaturated annotation (value is 0.0-1.0)
- Align all monitoring module arguments in root main.tf

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Run tofu fmt on root main.tf
All checks were successful
ci/woodpecker/push/terraform Pipeline was successful
ci/woodpecker/pr/terraform Pipeline was successful
ci/woodpecker/pull_request_closed/terraform Pipeline was successful
0e0412039f
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
ldraney deleted branch 409-landscaping-alerts-telegram 2026-06-05 04:02:56 +00:00
Sign in to join this conversation.
No description provided.