Add Grafana golden signals dashboard for landscaping-assistant #402

Merged
ldraney merged 1 commit from 16-landscaping-golden-signals into main 2026-06-04 05:08:27 +00:00
Owner

Closes #401

Cross-repo work for landscaping-assistant#16

Summary

  • New dashboard JSON provisioned via ConfigMap with grafana_dashboard: "1" sidecar label
  • Uses yabeda-rails metrics (rails_requests_total, rails_request_duration_bucket, rails_db_runtime_bucket, rails_view_runtime_bucket) — not generic http_* metrics
  • Uses yabeda-puma-plugin metrics (puma_threads_running, puma_threads_total, puma_threads_backlog, puma_workers, puma_threads_pool_capacity)

Panels (12 total across 4 golden signal sections)

  • Traffic: total request rate + breakdown by controller
  • Latency: p50/p95/p99 request duration + DB/view runtime breakdown
  • Errors: 5xx errors/s + error percentage (dual-axis)
  • Saturation: Puma thread utilization (running/total/backlog/%), CPU vs limits, memory vs limits, workers + pool capacity

Changes

  • terraform/dashboards/landscaping-assistant-golden-signals.json — new dashboard definition
  • terraform/modules/monitoring/main.tf — new kubernetes_config_map_v1 resource

Test Plan

  • tofu plan shows only the new ConfigMap resource
  • Dashboard appears in Grafana under "landscaping-assistant" tag after apply
  • All 12 panels load (may show "No data" without active traffic)

Review Checklist

  • Follows existing dashboard pattern (believers-elite, pal-e-app, etc.)
  • Valid JSON (validated with python3 json.load)
  • Uses ${DS_PROMETHEUS} datasource variable
  • Namespace filter on all queries: namespace="landscaping-assistant"
  • depends_on set to kube_prometheus_stack helm release

Acceptance Criteria

  • Dashboard auto-provisions via Grafana sidecar
  • Four golden signal sections visible
  • Puma thread metrics (unique to yabeda stack) displayed
  • landscaping-assistant#16 (parent feature ticket)
  • landscaping-assistant#43 (observability parent)
  • Pattern: pal-e-app-golden-signals.json
Closes #401 Cross-repo work for [landscaping-assistant#16](https://forgejo.tail5b443a.ts.net/ldraney/landscaping-assistant/issues/16) ## Summary - New dashboard JSON provisioned via ConfigMap with `grafana_dashboard: "1"` sidecar label - Uses yabeda-rails metrics (`rails_requests_total`, `rails_request_duration_bucket`, `rails_db_runtime_bucket`, `rails_view_runtime_bucket`) — not generic `http_*` metrics - Uses yabeda-puma-plugin metrics (`puma_threads_running`, `puma_threads_total`, `puma_threads_backlog`, `puma_workers`, `puma_threads_pool_capacity`) ## Panels (12 total across 4 golden signal sections) - **Traffic**: total request rate + breakdown by controller - **Latency**: p50/p95/p99 request duration + DB/view runtime breakdown - **Errors**: 5xx errors/s + error percentage (dual-axis) - **Saturation**: Puma thread utilization (running/total/backlog/%), CPU vs limits, memory vs limits, workers + pool capacity ## Changes - `terraform/dashboards/landscaping-assistant-golden-signals.json` — new dashboard definition - `terraform/modules/monitoring/main.tf` — new `kubernetes_config_map_v1` resource ## Test Plan - [ ] `tofu plan` shows only the new ConfigMap resource - [ ] Dashboard appears in Grafana under "landscaping-assistant" tag after apply - [ ] All 12 panels load (may show "No data" without active traffic) ## Review Checklist - [x] Follows existing dashboard pattern (believers-elite, pal-e-app, etc.) - [x] Valid JSON (validated with python3 json.load) - [x] Uses `${DS_PROMETHEUS}` datasource variable - [x] Namespace filter on all queries: `namespace="landscaping-assistant"` - [x] `depends_on` set to kube_prometheus_stack helm release ## Acceptance Criteria - Dashboard auto-provisions via Grafana sidecar - Four golden signal sections visible - Puma thread metrics (unique to yabeda stack) displayed ## Related Notes - landscaping-assistant#16 (parent feature ticket) - landscaping-assistant#43 (observability parent) - Pattern: `pal-e-app-golden-signals.json`
Add Grafana golden signals dashboard for landscaping-assistant (#401)
All checks were successful
ci/woodpecker/push/terraform Pipeline was successful
ci/woodpecker/pr/terraform Pipeline was successful
ci/woodpecker/pull_request_closed/terraform Pipeline was successful
59dc93f3d2
Provisions a dashboard via ConfigMap with grafana_dashboard sidecar label.
Panels: request rate by controller, latency percentiles (p50/p95/p99),
DB/view runtime breakdown, 5xx error rate, Puma thread utilization,
CPU/memory vs limits, and worker pool capacity. Uses yabeda-rails and
yabeda-puma-plugin metrics from the landscaping-assistant namespace.

Closes #401

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Author
Owner

PR #402 Review

DOMAIN REVIEW

Tech stack: Terraform (kubernetes_config_map_v1 resource) + Grafana dashboard JSON (PromQL queries against yabeda-rails and yabeda-puma-plugin metrics).

Terraform:

  • ConfigMap resource structure exactly matches the established believers_elite_dashboard pattern: same metadata layout, grafana_dashboard = "1" sidecar label, namespace reference via kubernetes_namespace_v1.monitoring.metadata[0].name, file path using ${path.module}/../../dashboards/, and depends_on for the helm release.
  • Placement in main.tf is correct -- inserted between the believers_elite_dashboard block and the payment pipeline alerts section.
  • No state-breaking changes. tofu plan should show a single new kubernetes_config_map_v1 resource with no modifications or destroys.

Dashboard JSON:

  • Valid JSON structure. id: null (correct for sidecar provisioning), uid: "landscaping-assistant-golden-signals" (deterministic, avoids duplicates), schemaVersion: 39, version: 1.
  • ${DS_PROMETHEUS} datasource variable properly declared in templating.list and referenced on every panel and target. Consistent with the believers-elite pattern.
  • All 12 panels present across 4 golden signal sections (Traffic, Latency, Errors, Saturation) matching the PR description.
  • Every PromQL query includes namespace="landscaping-assistant" filter -- no cross-namespace data leakage.

PromQL correctness:

  • histogram_quantile calls correctly group by (le) for p50/p95/p99 latency percentiles.
  • Division-by-zero protection via clamp_min in both the error percentage query (clamp_min(..., 0.001)) and the Puma utilization percentage query (clamp_min(..., 1)). Good practice.
  • rate() windows consistent at [5m] across all panels.
  • status=~"5.." regex correctly matches 5xx status codes.
  • CPU/memory saturation panels use standard container_cpu_usage_seconds_total and container_memory_working_set_bytes with proper container exclusion filters (container!="", container!="POD").
  • Metric names (rails_requests_total, rails_request_duration_bucket, rails_db_runtime_bucket, rails_view_runtime_bucket, puma_threads_running, puma_threads_total, puma_threads_backlog, puma_workers, puma_threads_pool_capacity) are consistent with yabeda-rails and yabeda-puma-plugin conventions.

BLOCKERS

None.

  • No secrets or credentials committed.
  • No unvalidated user input (dashboard JSON is static infrastructure config).
  • No DRY violations in auth/security paths.
  • Test coverage is appropriate for the domain: tofu plan + visual verification is the correct validation strategy for a Grafana dashboard JSON + Terraform ConfigMap. There is no unit-testable logic here.

NITS

  1. Branch naming: Branch is 16-landscaping-golden-signals, referencing landscaping-assistant#16 (the cross-repo parent). SOP convention is {issue-number}-{kebab-case-purpose} where the issue number should be the local repo issue -- in this case 401-landscaping-golden-signals. This is a minor process nit, not a blocker, since the PR body clearly documents both the local close (Closes #401) and the cross-repo reference.

  2. Legend calc consistency: The believers-elite dashboard uses ["mean", "lastNotNull", "sum"] in legend calcs, while this dashboard uses ["mean", "lastNotNull"] (no "sum"). The omission of sum is arguably correct for rate-based golden signal panels (summing rates over time is misleading), so this is actually an improvement over the existing pattern rather than a deviation. Just noting the difference.

  3. PR body Related section: The body references landscaping-assistant#16 and landscaping-assistant#43 inline, but does not have a formal ## Related section with a plan slug. Minor template deviation.

SOP COMPLIANCE

  • Branch named after local issue -- uses 16 (cross-repo) instead of 401 (local). Non-blocking.
  • PR body follows template -- Summary, Changes, Test Plan, Review Checklist, Acceptance Criteria all present.
  • Related references plan slug -- cross-repo references present but no formal ## Related section with plan slug.
  • No secrets committed
  • No scope creep -- exactly 2 files, both directly related to the dashboard provisioning task
  • Commit messages are descriptive (PR title is clear)

PROCESS OBSERVATIONS

  • Low change failure risk: additive-only change (0 deletions), creates a single new ConfigMap resource. No existing infrastructure modified.
  • Clean pattern replication from the believers-elite dashboard PR (#386), adapted correctly for the yabeda-rails metric naming convention instead of generic http_* metrics.
  • Cross-repo coordination is well-documented in the PR body, linking both landscaping-assistant#16 and landscaping-assistant#43.

VERDICT: APPROVED

## PR #402 Review ### DOMAIN REVIEW **Tech stack**: Terraform (kubernetes_config_map_v1 resource) + Grafana dashboard JSON (PromQL queries against yabeda-rails and yabeda-puma-plugin metrics). **Terraform**: - ConfigMap resource structure exactly matches the established `believers_elite_dashboard` pattern: same metadata layout, `grafana_dashboard = "1"` sidecar label, namespace reference via `kubernetes_namespace_v1.monitoring.metadata[0].name`, file path using `${path.module}/../../dashboards/`, and `depends_on` for the helm release. - Placement in `main.tf` is correct -- inserted between the believers_elite_dashboard block and the payment pipeline alerts section. - No state-breaking changes. `tofu plan` should show a single new `kubernetes_config_map_v1` resource with no modifications or destroys. **Dashboard JSON**: - Valid JSON structure. `id: null` (correct for sidecar provisioning), `uid: "landscaping-assistant-golden-signals"` (deterministic, avoids duplicates), `schemaVersion: 39`, `version: 1`. - `${DS_PROMETHEUS}` datasource variable properly declared in `templating.list` and referenced on every panel and target. Consistent with the believers-elite pattern. - All 12 panels present across 4 golden signal sections (Traffic, Latency, Errors, Saturation) matching the PR description. - Every PromQL query includes `namespace="landscaping-assistant"` filter -- no cross-namespace data leakage. **PromQL correctness**: - `histogram_quantile` calls correctly group `by (le)` for p50/p95/p99 latency percentiles. - Division-by-zero protection via `clamp_min` in both the error percentage query (`clamp_min(..., 0.001)`) and the Puma utilization percentage query (`clamp_min(..., 1)`). Good practice. - `rate()` windows consistent at `[5m]` across all panels. - `status=~"5.."` regex correctly matches 5xx status codes. - CPU/memory saturation panels use standard `container_cpu_usage_seconds_total` and `container_memory_working_set_bytes` with proper container exclusion filters (`container!=""`, `container!="POD"`). - Metric names (`rails_requests_total`, `rails_request_duration_bucket`, `rails_db_runtime_bucket`, `rails_view_runtime_bucket`, `puma_threads_running`, `puma_threads_total`, `puma_threads_backlog`, `puma_workers`, `puma_threads_pool_capacity`) are consistent with yabeda-rails and yabeda-puma-plugin conventions. ### BLOCKERS None. - No secrets or credentials committed. - No unvalidated user input (dashboard JSON is static infrastructure config). - No DRY violations in auth/security paths. - Test coverage is appropriate for the domain: `tofu plan` + visual verification is the correct validation strategy for a Grafana dashboard JSON + Terraform ConfigMap. There is no unit-testable logic here. ### NITS 1. **Branch naming**: Branch is `16-landscaping-golden-signals`, referencing `landscaping-assistant#16` (the cross-repo parent). SOP convention is `{issue-number}-{kebab-case-purpose}` where the issue number should be the local repo issue -- in this case `401-landscaping-golden-signals`. This is a minor process nit, not a blocker, since the PR body clearly documents both the local close (`Closes #401`) and the cross-repo reference. 2. **Legend calc consistency**: The believers-elite dashboard uses `["mean", "lastNotNull", "sum"]` in legend calcs, while this dashboard uses `["mean", "lastNotNull"]` (no `"sum"`). The omission of `sum` is arguably correct for rate-based golden signal panels (summing rates over time is misleading), so this is actually an improvement over the existing pattern rather than a deviation. Just noting the difference. 3. **PR body Related section**: The body references `landscaping-assistant#16` and `landscaping-assistant#43` inline, but does not have a formal `## Related` section with a plan slug. Minor template deviation. ### SOP COMPLIANCE - [ ] Branch named after local issue -- uses `16` (cross-repo) instead of `401` (local). Non-blocking. - [x] PR body follows template -- Summary, Changes, Test Plan, Review Checklist, Acceptance Criteria all present. - [ ] Related references plan slug -- cross-repo references present but no formal `## Related` section with plan slug. - [x] No secrets committed - [x] No scope creep -- exactly 2 files, both directly related to the dashboard provisioning task - [x] Commit messages are descriptive (PR title is clear) ### PROCESS OBSERVATIONS - Low change failure risk: additive-only change (0 deletions), creates a single new ConfigMap resource. No existing infrastructure modified. - Clean pattern replication from the believers-elite dashboard PR (#386), adapted correctly for the yabeda-rails metric naming convention instead of generic `http_*` metrics. - Cross-repo coordination is well-documented in the PR body, linking both `landscaping-assistant#16` and `landscaping-assistant#43`. ### VERDICT: APPROVED
ldraney deleted branch 16-landscaping-golden-signals 2026-06-04 05:08:27 +00:00
Sign in to join this conversation.
No description provided.