Add landscaping-assistant to blackbox uptime monitoring #21
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Type
Feature
Lineage
Standalone -- discovered during DORA metrics gap analysis. MTTR calculation relies on uptime probes to detect failures, but landscaping-assistant is not in the blackbox exporter target list.
Repo
ldraney/pal-e-platformUser Story
As a platform operator
I want landscaping-assistant included in blackbox uptime monitoring
So that deployment failures are detected and MTTR can be measured accurately
Context
The platform uses prometheus-blackbox-exporter for synthetic uptime monitoring of all services. Currently monitored apps include basketball-api, westside-app, platform-validation, and playme2k -- but landscaping-assistant is missing from the target list in
pal-e-platform/terraform/modules/monitoring/main.tf. Without uptime probes, the DORA MTTR metric has no failure signal for this app. The blackbox exporter firesEndpointDownalerts whenprobe_success == 0for 5 minutes, which feeds the change failure rate calculation.File Targets
Files the agent should modify or create:
terraform/modules/monitoring/main.tf-- add landscaping-assistant toserviceMonitor.targetsin the blackbox_exporter helm releaseFiles the agent should NOT touch:
terraform/dashboards/dora-dashboard.json-- dashboard is repo-agnosticAcceptance Criteria
probe_success{target=~".*landscaping.*"}metric appears in Prometheus after terraform applyTest Expectations
terraform planand verify only the blackbox exporter resource changesprobe_success{target=~".*landscaping.*"}in Prometheusterraform plan -target=helm_release.blackbox_exporterConstraints
http://landscaping-assistant.landscaping-assistant.svc.cluster.local:3000)tier = "app"label to match other application servicesChecklist
Related
project-landscaping-assistant-- project this affectsScope Addition: Post-Deploy Smoke Test
Per testing strategy audit (docs/testing-strategy.md), this issue should also cover a post-deploy smoke test — a one-time validation that runs after each ArgoCD sync to verify the deployment is healthy beyond just the k8s readiness probe.
Additional acceptance criteria:
GET /(root) returns 200 with queue contentGET /upreturns 200GET /uploadsreturns 200Implementation consideration: ArgoCD sync is async after Harbor push. Options:
This overlaps with but is distinct from uptime monitoring — uptime monitors catch ongoing failures, the smoke test catches deploy-specific regressions immediately.