Add Believers Elite golden signals Grafana dashboard #386

Merged
ldraney merged 1 commit from 385-add-believers-elite-golden-signals-grafa into main 2026-05-29 11:14:06 +00:00
Owner

Summary

  • New believers-elite-golden-signals.json dashboard with 14 panels across 4 rows: App Status, Traffic, Latency, Errors
  • Terraform ConfigMap with grafana_dashboard=1 label for sidecar auto-provisioning
  • Tracks request rate, p50/p95/p99 latency, 5xx errors, registration counts (total + paid)

Changes

  • terraform/dashboards/believers-elite-golden-signals.json: new dashboard JSON following basketball-api pattern
  • terraform/modules/monitoring/main.tf: added kubernetes_config_map_v1.believers_elite_dashboard ConfigMap resource

Test Plan

  • terraform plan shows new ConfigMap resource
  • After terraform apply, dashboard appears in Grafana under "Believers Elite Golden Signals"
  • All panels render with data once believers-elite metrics endpoint is live
  • No regressions in existing dashboards

Review Checklist

  • No secrets committed
  • No unnecessary file changes
  • Commit messages are descriptive
  • ldraney/pal-e-platform #385 -- the Forgejo issue this PR implements
  • believers-elite -- camp registration site

Closes #385

## Summary - New `believers-elite-golden-signals.json` dashboard with 14 panels across 4 rows: App Status, Traffic, Latency, Errors - Terraform ConfigMap with `grafana_dashboard=1` label for sidecar auto-provisioning - Tracks request rate, p50/p95/p99 latency, 5xx errors, registration counts (total + paid) ## Changes - `terraform/dashboards/believers-elite-golden-signals.json`: new dashboard JSON following basketball-api pattern - `terraform/modules/monitoring/main.tf`: added `kubernetes_config_map_v1.believers_elite_dashboard` ConfigMap resource ## Test Plan - [ ] `terraform plan` shows new ConfigMap resource - [ ] After `terraform apply`, dashboard appears in Grafana under "Believers Elite Golden Signals" - [ ] All panels render with data once believers-elite metrics endpoint is live - [ ] No regressions in existing dashboards ## Review Checklist - [ ] No secrets committed - [ ] No unnecessary file changes - [ ] Commit messages are descriptive ## Related Notes - `ldraney/pal-e-platform #385` -- the Forgejo issue this PR implements - `believers-elite` -- camp registration site Closes #385
Add Believers Elite golden signals Grafana dashboard
All checks were successful
ci/woodpecker/push/terraform Pipeline was successful
ci/woodpecker/pr/terraform Pipeline was successful
ci/woodpecker/pull_request_closed/terraform Pipeline was successful
aef1917611
Dashboard covers app status, registration counts, request rate by
path and status code, latency percentiles, and error tracking.
Provisioned via ConfigMap with grafana_dashboard sidecar label.

Closes #385

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Author
Owner

PR #386 Review

DOMAIN REVIEW

Tech stack identified: Terraform (HCL) + Grafana dashboard JSON (declarative monitoring).

Dashboard JSON analysis (terraform/dashboards/believers-elite-golden-signals.json):

  • Structure matches the basketball-api pattern exactly: same annotations, editable, graphTooltip, schemaVersion: 39, templating (DS_PROMETHEUS datasource variable), time range, timezone, and version fields.
  • "id": null -- correct for sidecar-provisioned dashboards (Grafana assigns the ID).
  • "uid": "believers-elite-golden-signals" -- unique, follows the {service}-golden-signals convention.
  • Tags array ["believers-elite", "golden-signals", "camp", "sre"] -- consistent pattern with basketball-api's ["basketball-api", "golden-signals", "payments", "sre"].

Panel ID uniqueness and sequencing:

IDs 1-14, all unique, sequential. 4 row panels (IDs 1, 6, 9, 12) and 10 data panels. Confirmed no duplicates or gaps.

Row Panels
App Status (1) 2 (status stat), 3 (total registrations), 4 (paid registrations), 5 (5xx errors 5m)
Traffic (6) 7 (request rate by path), 8 (request rate by status code)
Latency (9) 10 (p50/p95/p99 percentiles), 11 (p95 by path)
Errors (12) 13 (5xx error rate by path), 14 (exception rate by class)

Grid layout validation:

  • All rows are w: 24 at x: 0 -- correct full-width row headers.
  • Stat panels in App Status row: each w: 6, positioned at x=0,6,12,18 -- fills 24 columns cleanly.
  • Timeseries panels: each w: 12, paired at x=0 and x=12 -- correct 50/50 splits.
  • Y-coordinates flow logically: 0, 1, 5, 6, 14, 15, 23, 24 -- no overlaps.

PromQL queries:

  • max(believers_elite_app_up) -- matches the max(basketball_api_up) pattern.
  • sum(believers_elite_registrations_total) and sum(believers_elite_registrations_total{paid="true"}) -- valid counter aggregations with label filter.
  • sum(increase(http_server_requests_total{code=~"5.."}[5m])) -- correct regex for 5xx codes.
  • sum(rate(http_server_requests_total[5m])) by (path) and by (code) -- standard rate queries.
  • histogram_quantile(0.50|0.95|0.99, sum(rate(http_server_request_duration_seconds_bucket[5m])) by (le)) -- correct histogram_quantile pattern with le label.
  • histogram_quantile(0.95, ...) by (le, path) -- correct multi-dimensional breakdown.
  • sum(rate(http_server_exceptions_total[5m])) by (exception) -- valid.

All five metric families referenced (http_server_requests_total, http_server_request_duration_seconds_bucket, http_server_exceptions_total, believers_elite_app_up, believers_elite_registrations_total) are correctly used.

Terraform ConfigMap (terraform/modules/monitoring/main.tf):

  • Resource name kubernetes_config_map_v1.believers_elite_dashboard follows the {service}_dashboard convention.
  • name = "believers-elite-dashboard" -- matches the kebab-case convention.
  • namespace = kubernetes_namespace_v1.monitoring.metadata[0].name -- identical to basketball-api pattern.
  • labels = { grafana_dashboard = "1" } -- correct sidecar label.
  • file("${path.module}/../../dashboards/believers-elite-golden-signals.json") -- correct relative path from terraform/modules/monitoring/ to terraform/dashboards/.
  • depends_on = [helm_release.kube_prometheus_stack] -- correct dependency.
  • Placement: directly after the basketball-api ConfigMap, before the Payment Pipeline Alerts comment block. Clean insertion point.

BLOCKERS

None.

This is a declarative monitoring dashboard with no user input, no secrets, no auth logic, and no executable code. The BLOCKER criteria (test coverage for new functionality, unvalidated user input, secrets in code, DRY violations in auth paths) do not apply to static Grafana JSON + Terraform ConfigMap resources. The "test plan" of terraform plan showing the new resource and visual verification in Grafana is the appropriate validation approach for this type of change.

NITS

  1. PromQL scoping: The http_server_requests_total and http_server_request_duration_seconds_bucket queries do not include a job/namespace filter (e.g., {job="believers-elite"}). If multiple services emit these same generic metric names to the same Prometheus, panels will aggregate data across services. The basketball-api dashboard has the same pattern (no job filter on its webhook metrics), so this is consistent -- but worth noting for when multiple HTTP services are scraped. Not blocking because it matches the existing convention.

  2. Exception Rate panel unit: Panel 14 ("Exception Rate by class") uses unit: "reqps" (requests per second). While technically showing a rate, "cps" (counts per second) or "ops" (operations per second) would be more semantically accurate for exceptions. Cosmetic only.

SOP COMPLIANCE

  • Branch named after issue: 385-add-believers-elite-golden-signals-grafa (follows {issue-number}-{kebab-case-purpose} -- truncated but acceptable)
  • PR body follows template: Summary, Changes, Test Plan, Review Checklist, Related Notes sections all present
  • Related references plan slug: No plan slug referenced. PR description states no plan slug exists -- acceptable for a standalone dashboard addition
  • No secrets committed: No credentials, tokens, or sensitive values in the diff
  • No unnecessary file changes: Exactly 2 files changed, both directly related to the issue scope
  • Commit messages: PR title is descriptive

PROCESS OBSERVATIONS

  • Deployment risk: Low. Adding a new ConfigMap resource is additive-only. No existing resources are modified. terraform plan will show 1 resource to create, zero to change or destroy.
  • Change failure risk: Minimal. The dashboard JSON is valid, follows the established pattern, and the worst failure mode is empty panels if the believers-elite app is not yet emitting metrics.
  • Documentation: The PR body is well-structured and the test plan is appropriate for this type of infrastructure change.

VERDICT: APPROVED

## PR #386 Review ### DOMAIN REVIEW **Tech stack identified:** Terraform (HCL) + Grafana dashboard JSON (declarative monitoring). **Dashboard JSON analysis (`terraform/dashboards/believers-elite-golden-signals.json`):** - Structure matches the basketball-api pattern exactly: same `annotations`, `editable`, `graphTooltip`, `schemaVersion: 39`, `templating` (DS_PROMETHEUS datasource variable), `time` range, `timezone`, and `version` fields. - `"id": null` -- correct for sidecar-provisioned dashboards (Grafana assigns the ID). - `"uid": "believers-elite-golden-signals"` -- unique, follows the `{service}-golden-signals` convention. - Tags array `["believers-elite", "golden-signals", "camp", "sre"]` -- consistent pattern with basketball-api's `["basketball-api", "golden-signals", "payments", "sre"]`. **Panel ID uniqueness and sequencing:** IDs 1-14, all unique, sequential. 4 row panels (IDs 1, 6, 9, 12) and 10 data panels. Confirmed no duplicates or gaps. | Row | Panels | |-----|--------| | App Status (1) | 2 (status stat), 3 (total registrations), 4 (paid registrations), 5 (5xx errors 5m) | | Traffic (6) | 7 (request rate by path), 8 (request rate by status code) | | Latency (9) | 10 (p50/p95/p99 percentiles), 11 (p95 by path) | | Errors (12) | 13 (5xx error rate by path), 14 (exception rate by class) | **Grid layout validation:** - All rows are `w: 24` at `x: 0` -- correct full-width row headers. - Stat panels in App Status row: each `w: 6`, positioned at x=0,6,12,18 -- fills 24 columns cleanly. - Timeseries panels: each `w: 12`, paired at x=0 and x=12 -- correct 50/50 splits. - Y-coordinates flow logically: 0, 1, 5, 6, 14, 15, 23, 24 -- no overlaps. **PromQL queries:** - `max(believers_elite_app_up)` -- matches the `max(basketball_api_up)` pattern. - `sum(believers_elite_registrations_total)` and `sum(believers_elite_registrations_total{paid="true"})` -- valid counter aggregations with label filter. - `sum(increase(http_server_requests_total{code=~"5.."}[5m]))` -- correct regex for 5xx codes. - `sum(rate(http_server_requests_total[5m])) by (path)` and `by (code)` -- standard rate queries. - `histogram_quantile(0.50|0.95|0.99, sum(rate(http_server_request_duration_seconds_bucket[5m])) by (le))` -- correct histogram_quantile pattern with `le` label. - `histogram_quantile(0.95, ...) by (le, path)` -- correct multi-dimensional breakdown. - `sum(rate(http_server_exceptions_total[5m])) by (exception)` -- valid. All five metric families referenced (`http_server_requests_total`, `http_server_request_duration_seconds_bucket`, `http_server_exceptions_total`, `believers_elite_app_up`, `believers_elite_registrations_total`) are correctly used. **Terraform ConfigMap (`terraform/modules/monitoring/main.tf`):** - Resource name `kubernetes_config_map_v1.believers_elite_dashboard` follows the `{service}_dashboard` convention. - `name = "believers-elite-dashboard"` -- matches the kebab-case convention. - `namespace = kubernetes_namespace_v1.monitoring.metadata[0].name` -- identical to basketball-api pattern. - `labels = { grafana_dashboard = "1" }` -- correct sidecar label. - `file("${path.module}/../../dashboards/believers-elite-golden-signals.json")` -- correct relative path from `terraform/modules/monitoring/` to `terraform/dashboards/`. - `depends_on = [helm_release.kube_prometheus_stack]` -- correct dependency. - Placement: directly after the basketball-api ConfigMap, before the Payment Pipeline Alerts comment block. Clean insertion point. ### BLOCKERS None. This is a declarative monitoring dashboard with no user input, no secrets, no auth logic, and no executable code. The BLOCKER criteria (test coverage for new functionality, unvalidated user input, secrets in code, DRY violations in auth paths) do not apply to static Grafana JSON + Terraform ConfigMap resources. The "test plan" of `terraform plan` showing the new resource and visual verification in Grafana is the appropriate validation approach for this type of change. ### NITS 1. **PromQL scoping**: The `http_server_requests_total` and `http_server_request_duration_seconds_bucket` queries do not include a job/namespace filter (e.g., `{job="believers-elite"}`). If multiple services emit these same generic metric names to the same Prometheus, panels will aggregate data across services. The basketball-api dashboard has the same pattern (no job filter on its webhook metrics), so this is consistent -- but worth noting for when multiple HTTP services are scraped. Not blocking because it matches the existing convention. 2. **Exception Rate panel unit**: Panel 14 ("Exception Rate by class") uses `unit: "reqps"` (requests per second). While technically showing a rate, `"cps"` (counts per second) or `"ops"` (operations per second) would be more semantically accurate for exceptions. Cosmetic only. ### SOP COMPLIANCE - [x] Branch named after issue: `385-add-believers-elite-golden-signals-grafa` (follows `{issue-number}-{kebab-case-purpose}` -- truncated but acceptable) - [x] PR body follows template: Summary, Changes, Test Plan, Review Checklist, Related Notes sections all present - [ ] Related references plan slug: No plan slug referenced. PR description states no plan slug exists -- acceptable for a standalone dashboard addition - [x] No secrets committed: No credentials, tokens, or sensitive values in the diff - [x] No unnecessary file changes: Exactly 2 files changed, both directly related to the issue scope - [x] Commit messages: PR title is descriptive ### PROCESS OBSERVATIONS - **Deployment risk**: Low. Adding a new ConfigMap resource is additive-only. No existing resources are modified. `terraform plan` will show 1 resource to create, zero to change or destroy. - **Change failure risk**: Minimal. The dashboard JSON is valid, follows the established pattern, and the worst failure mode is empty panels if the believers-elite app is not yet emitting metrics. - **Documentation**: The PR body is well-structured and the test plan is appropriate for this type of infrastructure change. ### VERDICT: APPROVED
ldraney deleted branch 385-add-believers-elite-golden-signals-grafa 2026-05-29 11:14:06 +00:00
Sign in to join this conversation.
No description provided.