Westside-unified Grafana dashboard (basketball-api + 5 westside namespaces) #328

Open
opened 2026-05-02 14:52:06 +00:00 by forgejo_admin · 0 comments
Contributor

Type

Feature

Lineage

Standalone — discovered 2026-05-01 during alert-state audit. Depends on the new probes ticket (westside-contracts/email/ai-assistant blackbox probes).

Repo

forgejo_admin/pal-e-platform

User Story

As Lucas, Marcus, or oncall, I want one Grafana dashboard that shows the entire westside platform's health at a glance — basketball-api golden signals plus uptime for westside-app / westside-contracts / westside-email / westside-ai-assistant / playme2k — so that "is westside up?" is answered in one click instead of three.

Context

Today there is no unified westside view. The basketball-api-golden-signals dashboard (shipping in #290) is service-specific. The uptime-dashboard shows generic blackbox probes mixed across all platform services. There's no per-product overview. As we add more westside-adjacent services, this gap widens.

Pattern reference: the existing pal-e-app-golden-signals and incoming basketball-api-golden-signals dashboards. ConfigMap-based provisioning via grafana_dashboard: "1" label.

File Targets

Files to create:

  • terraform/dashboards/westside-platform-overview.json — new dashboard (model after basketball-api-golden-signals)
  • terraform/modules/monitoring/main.tf — new kubernetes_config_map_v1.westside_platform_dashboard resource near the existing dashboard ConfigMaps

Files NOT to touch:

  • existing dashboard JSON files
  • existing dashboard ConfigMaps

Acceptance Criteria

  • Dashboard westside-platform-overview provisioned in Grafana
  • Top row: 6 stat panels (one per westside service: basketball-api / westside-app / westside-contracts / westside-email / westside-ai-assistant / playme2k) showing UP/DOWN
  • Per-service rows with: probe_success time-series, request rate (where metrics exist), error rate, latency p95
  • Single dashboard variable service to filter
  • Dashboard uses existing Prometheus datasource UID ${DS_PROMETHEUS}
  • Renders without errors at first load (no "Datasource not found" panels)

Test Expectations

  • tofu validate passes
  • After deploy: navigate to https://grafana.tail5b443a.ts.net/d/westside-platform-overview and verify all panels render
  • After deploy: change service variable, panels re-filter

Constraints

  • Match the existing dashboard JSON shape (annotations, refresh, datasource refs)
  • ConfigMap label grafana_dashboard: "1" is required for sidecar provisioning
  • Don't introduce new metric exporters in this ticket — only consume existing metrics + probes

Checklist

  • PR opened
  • tofu validate + fmt clean
  • No unrelated changes
  • pal-e-platform — project
  • alert-report-2026-05-01 — alert snapshot
  • Depends on the westside-probes ticket (probes for contracts/email/ai-assistant must exist first)
### Type Feature ### Lineage Standalone — discovered 2026-05-01 during alert-state audit. Depends on the new probes ticket (`westside-contracts/email/ai-assistant` blackbox probes). ### Repo `forgejo_admin/pal-e-platform` ### User Story As Lucas, Marcus, or oncall, I want one Grafana dashboard that shows the entire westside platform's health at a glance — basketball-api golden signals plus uptime for `westside-app` / `westside-contracts` / `westside-email` / `westside-ai-assistant` / `playme2k` — so that "is westside up?" is answered in one click instead of three. ### Context Today there is no unified westside view. The `basketball-api-golden-signals` dashboard (shipping in #290) is service-specific. The `uptime-dashboard` shows generic blackbox probes mixed across all platform services. There's no per-product overview. As we add more westside-adjacent services, this gap widens. Pattern reference: the existing `pal-e-app-golden-signals` and incoming `basketball-api-golden-signals` dashboards. ConfigMap-based provisioning via `grafana_dashboard: "1"` label. ### File Targets Files to create: - `terraform/dashboards/westside-platform-overview.json` — new dashboard (model after basketball-api-golden-signals) - `terraform/modules/monitoring/main.tf` — new `kubernetes_config_map_v1.westside_platform_dashboard` resource near the existing dashboard ConfigMaps Files NOT to touch: - existing dashboard JSON files - existing dashboard ConfigMaps ### Acceptance Criteria - [ ] Dashboard `westside-platform-overview` provisioned in Grafana - [ ] Top row: 6 stat panels (one per westside service: basketball-api / westside-app / westside-contracts / westside-email / westside-ai-assistant / playme2k) showing UP/DOWN - [ ] Per-service rows with: probe_success time-series, request rate (where metrics exist), error rate, latency p95 - [ ] Single dashboard variable `service` to filter - [ ] Dashboard uses existing Prometheus datasource UID `${DS_PROMETHEUS}` - [ ] Renders without errors at first load (no "Datasource not found" panels) ### Test Expectations - [ ] `tofu validate` passes - [ ] After deploy: navigate to `https://grafana.tail5b443a.ts.net/d/westside-platform-overview` and verify all panels render - [ ] After deploy: change `service` variable, panels re-filter ### Constraints - Match the existing dashboard JSON shape (annotations, refresh, datasource refs) - ConfigMap label `grafana_dashboard: "1"` is required for sidecar provisioning - Don't introduce new metric exporters in this ticket — only consume existing metrics + probes ### Checklist - [ ] PR opened - [ ] tofu validate + fmt clean - [ ] No unrelated changes ### Related - `pal-e-platform` — project - `alert-report-2026-05-01` — alert snapshot - Depends on the westside-probes ticket (probes for contracts/email/ai-assistant must exist first)
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
ldraney/pal-e-platform#328
No description provided.