Add alerting architecture doc #112
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "docs/alerting-architecture"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Summary
Documents the alerting setup added in pal-e-platform PR #410.
Changes
docs/alerting.md: New doc covering alert rules, AlertManager routing, available metrics, Mermaid architecture diagram, and how to add new alertsdocs/observability-roadmap.md: Added link to the new alerting docContext
Follows pal-e-platform PR #408 (disable default rules) and PR #410 (add landscaping-assistant alerts + dedicated Telegram channel). This documents the live setup for future reference.
Test Plan
Review Checklist
Related Notes
Closes #95
PR #112 Review
DOMAIN REVIEW
Tech stack: Documentation only -- Markdown with Mermaid diagrams. No application code changes.
Content quality:
graph LRlayout is a good choice for a pipeline flow.continue: false,send_resolved: true).tofu apply -targetcommand, which is helpful for future reference.Cross-links: Bidirectional.
alerting.mdlinks toobservability-roadmap.mdandinfrastructure-and-pipeline.md. The roadmap now links back toalerting.md. Both target files confirmed to exist on disk.Consistency with observability-roadmap.md: The roadmap's "Alerting" row says "Alertmanager -> Telegram + Slack" but the new alerting doc only documents Telegram routing (no Slack receiver). This is not a blocker -- the roadmap describes the target architecture while the alerting doc describes what is live -- but a brief note in
alerting.mdacknowledging that Slack routing is planned but not yet configured would prevent future confusion.BLOCKERS
None. This is a docs-only change. No code, no secrets, no credentials, no user input handling. The BLOCKER criteria (test coverage, input validation, secrets, DRY auth) do not apply to pure documentation.
NITS
Slack mention gap: The observability roadmap lists "Alertmanager -> Telegram + Slack" as COMPLETE, but
alerting.mdonly documents Telegram routing. Consider either adding a note that Slack is planned/not-yet-configured, or updating the roadmap table to reflect the actual state.Branch naming convention: Branch is
docs/alerting-architecture. SOP convention is{issue-number}-{kebab-case-purpose}(e.g.,95-alerting-architecture). Thedocs/prefix style is reasonable for documentation branches but does not match the documented convention.Related section: PR body uses "Related Notes" instead of "Related" and references issue numbers (
landscaping-assistant#95,landscaping-assistant#17) rather than a plan slug. Acceptable for a docs-only PR where there may not be a formal plan, but noted for SOP alignment.PromQL expression truncation: The
LandscapingLatencyHighexpression is shown ashistogram_quantile(0.95, ...) > 1with an ellipsis. For a reference doc, the full expression (or at least a note saying "see Terraform source for full query") would be more useful to someone debugging an alert at 2 AM.SOP COMPLIANCE
docs/alerting-architectureinstead of95-alerting-architecturePROCESS OBSERVATIONS
alerting.mdandobservability-roadmap.mdkeeps the docs navigable as the observability suite grows.VERDICT: APPROVED
Clean documentation PR. The nits (branch naming, truncated PromQL, Slack mention gap) are non-blocking. Content is accurate, well-structured, and properly cross-linked.