Deploy DORA exporter + Grafana dashboard (Phase 2+3) #10

Merged
forgejo_admin merged 2 commits from feat/dora-deploy-dashboard into main 2026-03-02 08:29:58 +00:00

Summary

  • Deploys the DORA metrics exporter to the monitoring namespace with Prometheus scraping via ServiceMonitor
  • Creates a Grafana dashboard (auto-discovered via sidecar) showing all four DORA metrics per repo
  • Adds woodpecker_api_token to Makefile secret rendering pipeline

Closes #9

Changes

terraform/main.tf

  • kubernetes_secret_v1.dora_exporter -- env vars for Woodpecker + Forgejo API access using internal cluster URLs (*.svc.cluster.local)
  • kubernetes_deployment_v1.dora_exporter -- single replica, 50m CPU / 32Mi-128Mi memory, liveness+readiness probes on /health:8000, envFrom secret injection
  • kubernetes_service_v1.dora_exporter -- ClusterIP on port 8000
  • kubernetes_manifest.dora_exporter_service_monitor -- ServiceMonitor CRD, 60s scrape interval on /metrics
  • kubernetes_config_map_v1.dora_dashboard -- Grafana dashboard ConfigMap with grafana_dashboard = "1" label for sidecar auto-discovery
  • Minor formatting fix from tofu fmt in existing MinIO IAM policy block

terraform/dashboards/dora-dashboard.json (new)

Grafana dashboard JSON with:

  • Overview row: 4 single-stat panels (deploys/day, lead time p50, CFR%, MTTR)
  • Deployment Frequency: time-series panel, rate(dora_deployments_total{status="success"}[1d]) * 86400 per repo
  • Lead Time for Changes: time-series panel, dora_pr_lead_time_seconds p50/p95 per repo (displayed in hours)
  • Change Failure Rate: gauge panel, dora_deployments_total{status="failure"} / total per repo (percentage)
  • MTTR: stat panel, derived from dora_deployment_last_success_timestamp
  • Template variables: DS_PROMETHEUS (datasource) and repo (multi-select from label_values)

terraform/variables.tf

  • woodpecker_api_token (string, sensitive) -- Woodpecker CI personal API token
  • dora_exporter_image (string, default harbor.tail5b443a.ts.net/pal-e-dora-exporter/dora-exporter:latest)

Makefile

  • Added woodpecker_api_token to TF_SECRET_VARS list

Validation

$ tofu fmt -recursive
main.tf  (minor existing alignment fix only)

$ tofu validate
Success! The configuration is valid.

Test Plan

  • Operator: generate Woodpecker API token (Woodpecker UI > User Settings > API Tokens)
  • Operator: add woodpecker_api_token to Salt pillar (salt/pillar/secrets/platform.sls)
  • Run make tofu-secrets -- verify woodpecker_api_token appears in secrets.auto.tfvars
  • Run make tofu-plan -- review plan output for 5 new resources
  • Run make tofu-apply -- deploy
  • Verify: kubectl get pods -n monitoring -l app=dora-exporter shows Running
  • Verify: Prometheus targets page shows dora-exporter as UP
  • Verify: promql: dora_deployments_total returns data
  • Verify: Grafana "DORA Metrics" dashboard loads with real data

Review Checklist

  • tofu fmt applied
  • tofu validate passes
  • Dashboard JSON is valid
  • Uses internal cluster URLs (not Tailscale funnels)
  • Reuses existing forgejo_admin_username / forgejo_admin_password vars
  • Secret values are marked sensitive = true
  • depends_on chains follow existing patterns
  • Makefile TF_SECRET_VARS updated
  • No changes outside terraform/ and Makefile

Pre-Apply Operator Steps

NOTE: Operator must generate woodpecker_api_token and add to Salt pillar before tofu apply.

  1. Generate Woodpecker API token (Woodpecker UI > User Settings > API Tokens)
  2. Add to Salt pillar: salt/pillar/secrets/platform.sls (GPG-encrypted)
  3. make tofu-secrets to render updated tfvars
  4. make tofu-plan to review
  5. make tofu-apply
  • plan-2026-03-01-dora-metrics-dashboard -- parent plan (Phase 2 + Phase 3)
  • issue-pal-e-platform-dora-deploy-dashboard -- pal-e-docs issue
  • dora-framework -- the axiom this makes measurable
  • issue-pal-e-dora-exporter-service -- Phase 1 (resolved, PR #1 on pal-e-dora-exporter)
## Summary - Deploys the DORA metrics exporter to the monitoring namespace with Prometheus scraping via ServiceMonitor - Creates a Grafana dashboard (auto-discovered via sidecar) showing all four DORA metrics per repo - Adds `woodpecker_api_token` to Makefile secret rendering pipeline Closes #9 ## Changes ### `terraform/main.tf` - **`kubernetes_secret_v1.dora_exporter`** -- env vars for Woodpecker + Forgejo API access using internal cluster URLs (`*.svc.cluster.local`) - **`kubernetes_deployment_v1.dora_exporter`** -- single replica, 50m CPU / 32Mi-128Mi memory, liveness+readiness probes on `/health:8000`, `envFrom` secret injection - **`kubernetes_service_v1.dora_exporter`** -- ClusterIP on port 8000 - **`kubernetes_manifest.dora_exporter_service_monitor`** -- ServiceMonitor CRD, 60s scrape interval on `/metrics` - **`kubernetes_config_map_v1.dora_dashboard`** -- Grafana dashboard ConfigMap with `grafana_dashboard = "1"` label for sidecar auto-discovery - Minor formatting fix from `tofu fmt` in existing MinIO IAM policy block ### `terraform/dashboards/dora-dashboard.json` (new) Grafana dashboard JSON with: - **Overview row**: 4 single-stat panels (deploys/day, lead time p50, CFR%, MTTR) - **Deployment Frequency**: time-series panel, `rate(dora_deployments_total{status="success"}[1d]) * 86400` per repo - **Lead Time for Changes**: time-series panel, `dora_pr_lead_time_seconds` p50/p95 per repo (displayed in hours) - **Change Failure Rate**: gauge panel, `dora_deployments_total{status="failure"} / total` per repo (percentage) - **MTTR**: stat panel, derived from `dora_deployment_last_success_timestamp` - Template variables: `DS_PROMETHEUS` (datasource) and `repo` (multi-select from `label_values`) ### `terraform/variables.tf` - `woodpecker_api_token` (string, sensitive) -- Woodpecker CI personal API token - `dora_exporter_image` (string, default `harbor.tail5b443a.ts.net/pal-e-dora-exporter/dora-exporter:latest`) ### `Makefile` - Added `woodpecker_api_token` to `TF_SECRET_VARS` list ## Validation ``` $ tofu fmt -recursive main.tf (minor existing alignment fix only) $ tofu validate Success! The configuration is valid. ``` ## Test Plan - [ ] Operator: generate Woodpecker API token (Woodpecker UI > User Settings > API Tokens) - [ ] Operator: add `woodpecker_api_token` to Salt pillar (`salt/pillar/secrets/platform.sls`) - [ ] Run `make tofu-secrets` -- verify `woodpecker_api_token` appears in `secrets.auto.tfvars` - [ ] Run `make tofu-plan` -- review plan output for 5 new resources - [ ] Run `make tofu-apply` -- deploy - [ ] Verify: `kubectl get pods -n monitoring -l app=dora-exporter` shows Running - [ ] Verify: Prometheus targets page shows `dora-exporter` as UP - [ ] Verify: `promql: dora_deployments_total` returns data - [ ] Verify: Grafana "DORA Metrics" dashboard loads with real data ## Review Checklist - [x] `tofu fmt` applied - [x] `tofu validate` passes - [x] Dashboard JSON is valid - [x] Uses internal cluster URLs (not Tailscale funnels) - [x] Reuses existing `forgejo_admin_username` / `forgejo_admin_password` vars - [x] Secret values are marked `sensitive = true` - [x] `depends_on` chains follow existing patterns - [x] Makefile `TF_SECRET_VARS` updated - [x] No changes outside `terraform/` and `Makefile` ## Pre-Apply Operator Steps > **NOTE:** Operator must generate `woodpecker_api_token` and add to Salt pillar before `tofu apply`. 1. Generate Woodpecker API token (Woodpecker UI > User Settings > API Tokens) 2. Add to Salt pillar: `salt/pillar/secrets/platform.sls` (GPG-encrypted) 3. `make tofu-secrets` to render updated tfvars 4. `make tofu-plan` to review 5. `make tofu-apply` ## Related Notes - `plan-2026-03-01-dora-metrics-dashboard` -- parent plan (Phase 2 + Phase 3) - `issue-pal-e-platform-dora-deploy-dashboard` -- pal-e-docs issue - `dora-framework` -- the axiom this makes measurable - `issue-pal-e-dora-exporter-service` -- Phase 1 (resolved, PR #1 on pal-e-dora-exporter)
Add Terraform resources to deploy the DORA metrics exporter to the
monitoring namespace and create a Grafana dashboard for all four DORA
metrics (deployment frequency, lead time, change failure rate, MTTR).

Resources added:
- kubernetes_secret_v1: Woodpecker + Forgejo credentials (internal cluster URLs)
- kubernetes_deployment_v1: exporter pod (50m/32Mi, port 8000, health probes)
- kubernetes_service_v1: ClusterIP service on port 8000
- kubernetes_manifest: ServiceMonitor CRD (60s scrape interval)
- kubernetes_config_map_v1: Grafana dashboard with sidecar auto-discovery

Also adds woodpecker_api_token variable and updates Makefile TF_SECRET_VARS.

Closes #9

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Remove invalid status="failure" label selector from
dora_deployment_last_success_timestamp metric (it does not have a
status label). Use max() across all repos to show worst-case time
since last successful deployment.

Found during self-review.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Author
Owner

Review-fix round 1:

Fixed the MTTR overview panel PromQL query in terraform/dashboards/dora-dashboard.json.

Issue: The query used dora_deployment_last_success_timestamp{status="failure"} but this metric does not carry a status label -- it tracks the timestamp of the last successful deployment per repo. The status="failure" selector would return no data.

Fix: Changed to max((time() - dora_deployment_last_success_timestamp) / 3600) which computes the worst-case hours since last success across all repos, giving a meaningful aggregate MTTR overview value.

Commit: 38a693a

**Review-fix round 1:** Fixed the MTTR overview panel PromQL query in `terraform/dashboards/dora-dashboard.json`. **Issue:** The query used `dora_deployment_last_success_timestamp{status="failure"}` but this metric does not carry a `status` label -- it tracks the timestamp of the last *successful* deployment per repo. The `status="failure"` selector would return no data. **Fix:** Changed to `max((time() - dora_deployment_last_success_timestamp) / 3600)` which computes the worst-case hours since last success across all repos, giving a meaningful aggregate MTTR overview value. Commit: `38a693a`
forgejo_admin deleted branch feat/dora-deploy-dashboard 2026-03-02 08:29:58 +00:00
Sign in to join this conversation.
No description provided.