Add landscaping-assistant alerts and dedicated Telegram channel #410
No reviewers
Labels
No labels
domain:backend
domain:devops
domain:frontend
status:approved
status:in-progress
status:needs-fix
status:qa
type:bug
type:devops
type:feature
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
ldraney/pal-e-platform!410
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "409-landscaping-alerts-telegram"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Summary
Route landscaping-assistant alerts to a dedicated Telegram group with 4 app-specific PrometheusRule alerts.
Changes
terraform/modules/monitoring/main.tf: Addtelegram-landscapingreceiver, sub-route fornamespace=landscaping-assistant, PrometheusRule with 4 alerts (ErrorRateHigh, LatencyHigh, PumaSaturated, AppDown), andset_sensitiveblocks for the new receiverterraform/modules/monitoring/variables.tf: Addtelegram_landscaping_chat_idvariableterraform/main.tf: Pass new variable to monitoring moduleterraform/variables.tf: Declaretelegram_landscaping_chat_idat root levelTest Plan
terraform planshows new PrometheusRule + updated AlertManager configPost-apply
Add
telegram_landscaping_chat_id = "-1003862285795"toterraform/secrets.auto.tfvarsbefore runningterraform apply.Review Checklist
Related Notes
Closes #409
PR #410 Review
DOMAIN REVIEW
Stack: Terraform / Helm (kube-prometheus-stack) / Kubernetes (PrometheusRule CRD)
Receiver indexing: Verified against
mainbranch. The currentreceiversarray is[0]=default, [1]=telegram. The PR appends[2]=telegram-landscaping. Theset_sensitiveblocks correctly targetreceivers[2].telegram_configs[0].bot_tokenandreceivers[2].telegram_configs[0].chat_id. This is position-dependent and brittle by nature (any future reordering breaks it), but it matches the existing pattern forreceivers[1]-- no new risk introduced.Variable threading:
telegram_landscaping_chat_idis declared at rootterraform/variables.tf(sensitive=true), passed throughterraform/main.tfto the module, and declared interraform/modules/monitoring/variables.tf(sensitive=true). Complete chain, no gaps.PrometheusRule labels:
release = "kube-prometheus-stack"andapp.kubernetes.io/part-of = "kube-prometheus-stack"match the existingpayment-pipeline-alertspattern. The rule will be picked up by the Prometheus operator.Sub-route:
matchers = ["namespace = landscaping-assistant"]withcontinue = falsecorrectly short-circuits to the dedicated receiver. All other alerts fall through to the defaulttelegramreceiver.Bot token reuse: The landscaping receiver reuses
var.telegram_bot_token(same bot, different chat_id). This is correct Telegram bot architecture.PromQL expressions: All four are syntactically valid.
LandscapingErrorRateHighuses> 0threshold (any 5xx is critical) -- intentional and appropriate for a low-traffic app.LandscapingLatencyHighcorrectly excludes health check endpoints viacontroller!="rails/health".LandscapingAppDowndepends onprobe_success{service="landscaping-assistant"}which was added in PR #400.Secrets: No plaintext secrets in the diff. The chat_id value appears in the PR body as guidance for
secrets.auto.tfvars, but that file is gitignored and not committed.BLOCKERS
None.
NITS
tofu fmtalignment drift interraform/main.tf: The diff shows the new line using wider padding than the surrounding block:The first two lines were re-padded to align with the new longest key, but
tailscale_domainand everything below were not. Runtofu fmtto normalize the entire block.LandscapingPumaSaturatedannotation displays ratio, not percentage: The expression(busy / max) > 0.8yields a value like0.85, but the annotation says{{ $value | printf "%.0f" }}% of Puma threads are busy. At threshold, this renders as "1% of Puma threads are busy" instead of "85%". Either multiply by 100 in the expression (* 100 > 80) or fix the template to{{ $value | humanizePercentage }}/{{ $value | printf "%.0f%%" }}after multiplying.No test coverage note: This is IaC (Terraform + Prometheus rules), so the "new functionality must have tests" blocker does not apply in the traditional sense. The Test Plan section covers
terraform planvalidation and manual alert verification, which is the appropriate testing strategy for this domain.SOP COMPLIANCE
409-landscaping-alerts-telegrammatches issue #409landscaping-assistant#95,landscaping-assistant#17Closes #409links the parent issuePROCESS OBSERVATIONS
set_sensitiveindexing pattern (receivers[N]) is a known fragility across the entire monitoring module. If a future PR reorders or removes a receiver, all downstream indices break silently. Consider documenting the index mapping as a comment near theset_sensitiveblocks. Not a blocker for this PR since it follows the established pattern.VERDICT: APPROVED