Platform hardening: Woodpecker TLS fix, Trivy, dashboard, QA nits #55

Closed
opened 2026-03-14 18:57:29 +00:00 by forgejo_admin · 0 comments

Lineage

plan-pal-e-platform → Phase 4 (Dashboard) + Phase 10 (Vuln Scanning) + QA nits from PRs #35/#39
todo-woodpecker-tls-clone-fix (Priority 1)

Repo

forgejo_admin/pal-e-platform

User Story

As a platform operator
I want reliable CI clones, vulnerability scanning, golden signals dashboards, and cleaned up tech debt
So that the platform is hardened for team onboarding

Context

Four platform hardening items batched into one PR since they all touch terraform/main.tf:

P1 - Woodpecker TLS Clone Fix: Every Woodpecker pipeline clone uses the external Tailscale funnel URL (https://forgejo.tail5b443a.ts.net), causing intermittent TLS EOF failures. Fix: use internal service URL http://forgejo-http.forgejo.svc.cluster.local:80. Also remove WOODPECKER_FORGEJO_SKIP_VERIFY since plain HTTP doesn't need TLS verification.

P2 - Golden Signals Dashboard: Create a Grafana dashboard for pal-e-docs showing request rate, latency percentiles (p50/p95/p99), 5xx error rate, and CPU/memory vs limits. Follow existing pattern from DORA dashboard ConfigMap.

P3 - Harbor Trivy: Enable Trivy vulnerability scanning in Harbor. Currently trivy = { enabled = false } at line 696. Add resource limits for the scanner.

P4 - QA Nits: #alerts Slack channel is hardcoded but Slack receiver is already conditional (only active when slack_webhook_url is set) — this is actually fine as-is. Main nits: add force_destroy = true on MinIO buckets to prevent tofu destroy failures, and fix credential key naming consistency between CNPG and TF backup MinIO secrets.

File Targets

Files to modify:

  • terraform/main.tf — Woodpecker URL (line 575), Trivy toggle (line 696-698), MinIO bucket force_destroy, MinIO secret key naming, new dashboard ConfigMap
  • terraform/dashboards/pal-e-docs-golden-signals.json — new file, Grafana dashboard JSON

Files NOT to touch:

  • terraform/providers.tf — Helm provider caching nit from PR #38 is not worth the risk of provider behavior changes
  • .woodpecker.yaml — Trivy scan step is optional/deferred

Acceptance Criteria

  • WOODPECKER_FORGEJO_URL uses http://forgejo-http.forgejo.svc.cluster.local:80
  • WOODPECKER_FORGEJO_SKIP_VERIFY is removed
  • trivy = { enabled = true } with resource limits in Harbor Helm values
  • All 3 MinIO buckets (assets, postgres-wal, tf-state-backups) have force_destroy = true
  • MinIO secret key names are consistent (both use AWS_ACCESS_KEY_ID / AWS_SECRET_ACCESS_KEY pattern, or both use ACCESS_KEY_ID / ACCESS_SECRET_KEY — pick one and align)
  • New ConfigMap pal-e-docs-dashboard with grafana_dashboard = "1" label loads pal-e-docs-golden-signals.json
  • Dashboard JSON has panels: request rate, p50/p95/p99 latency, 5xx error rate, CPU usage, memory usage
  • tofu validate passes
  • tofu fmt passes

Test Expectations

  • tofu validate — all resources valid
  • tofu fmt -check — no formatting drift
  • tofu plan — shows expected changes (Woodpecker Helm update, Harbor Helm update, new ConfigMap, MinIO bucket updates)
  • Run command: cd terraform && tofu validate && tofu fmt -check

Constraints

  • Follow existing dashboard ConfigMap pattern from dora_dashboard resource (lines 1116-1130)
  • Dashboard JSON should use ${DS_PROMETHEUS} datasource variable (same as DORA dashboard)
  • Dashboard metrics assume prometheus-fastapi-instrumentator is deployed on pal-e-docs (separate PR)
  • Trivy resource limits: suggest requests: { cpu: "100m", memory: "256Mi" }, limits: { memory: "1Gi" } (Trivy needs memory for vuln DB)

Checklist

  • PR opened
  • tofu validate passes
  • tofu fmt passes
  • No unrelated changes
  • plan-pal-e-platform — parent plan
  • todo-woodpecker-tls-clone-fix — resolves this TODO
  • forgejo_admin/pal-e-docs issue for prometheus instrumentation (prerequisite for dashboard data)
### Lineage `plan-pal-e-platform` → Phase 4 (Dashboard) + Phase 10 (Vuln Scanning) + QA nits from PRs #35/#39 `todo-woodpecker-tls-clone-fix` (Priority 1) ### Repo `forgejo_admin/pal-e-platform` ### User Story As a platform operator I want reliable CI clones, vulnerability scanning, golden signals dashboards, and cleaned up tech debt So that the platform is hardened for team onboarding ### Context Four platform hardening items batched into one PR since they all touch `terraform/main.tf`: **P1 - Woodpecker TLS Clone Fix:** Every Woodpecker pipeline clone uses the external Tailscale funnel URL (`https://forgejo.tail5b443a.ts.net`), causing intermittent TLS EOF failures. Fix: use internal service URL `http://forgejo-http.forgejo.svc.cluster.local:80`. Also remove `WOODPECKER_FORGEJO_SKIP_VERIFY` since plain HTTP doesn't need TLS verification. **P2 - Golden Signals Dashboard:** Create a Grafana dashboard for pal-e-docs showing request rate, latency percentiles (p50/p95/p99), 5xx error rate, and CPU/memory vs limits. Follow existing pattern from DORA dashboard ConfigMap. **P3 - Harbor Trivy:** Enable Trivy vulnerability scanning in Harbor. Currently `trivy = { enabled = false }` at line 696. Add resource limits for the scanner. **P4 - QA Nits:** `#alerts` Slack channel is hardcoded but Slack receiver is already conditional (only active when `slack_webhook_url` is set) — this is actually fine as-is. Main nits: add `force_destroy = true` on MinIO buckets to prevent `tofu destroy` failures, and fix credential key naming consistency between CNPG and TF backup MinIO secrets. ### File Targets Files to modify: - `terraform/main.tf` — Woodpecker URL (line 575), Trivy toggle (line 696-698), MinIO bucket `force_destroy`, MinIO secret key naming, new dashboard ConfigMap - `terraform/dashboards/pal-e-docs-golden-signals.json` — new file, Grafana dashboard JSON Files NOT to touch: - `terraform/providers.tf` — Helm provider caching nit from PR #38 is not worth the risk of provider behavior changes - `.woodpecker.yaml` — Trivy scan step is optional/deferred ### Acceptance Criteria - [ ] `WOODPECKER_FORGEJO_URL` uses `http://forgejo-http.forgejo.svc.cluster.local:80` - [ ] `WOODPECKER_FORGEJO_SKIP_VERIFY` is removed - [ ] `trivy = { enabled = true }` with resource limits in Harbor Helm values - [ ] All 3 MinIO buckets (`assets`, `postgres-wal`, `tf-state-backups`) have `force_destroy = true` - [ ] MinIO secret key names are consistent (both use `AWS_ACCESS_KEY_ID` / `AWS_SECRET_ACCESS_KEY` pattern, or both use `ACCESS_KEY_ID` / `ACCESS_SECRET_KEY` — pick one and align) - [ ] New ConfigMap `pal-e-docs-dashboard` with `grafana_dashboard = "1"` label loads `pal-e-docs-golden-signals.json` - [ ] Dashboard JSON has panels: request rate, p50/p95/p99 latency, 5xx error rate, CPU usage, memory usage - [ ] `tofu validate` passes - [ ] `tofu fmt` passes ### Test Expectations - [ ] `tofu validate` — all resources valid - [ ] `tofu fmt -check` — no formatting drift - [ ] `tofu plan` — shows expected changes (Woodpecker Helm update, Harbor Helm update, new ConfigMap, MinIO bucket updates) - Run command: `cd terraform && tofu validate && tofu fmt -check` ### Constraints - Follow existing dashboard ConfigMap pattern from `dora_dashboard` resource (lines 1116-1130) - Dashboard JSON should use `${DS_PROMETHEUS}` datasource variable (same as DORA dashboard) - Dashboard metrics assume `prometheus-fastapi-instrumentator` is deployed on pal-e-docs (separate PR) - Trivy resource limits: suggest `requests: { cpu: "100m", memory: "256Mi" }, limits: { memory: "1Gi" }` (Trivy needs memory for vuln DB) ### Checklist - [ ] PR opened - [ ] `tofu validate` passes - [ ] `tofu fmt` passes - [ ] No unrelated changes ### Related - `plan-pal-e-platform` — parent plan - `todo-woodpecker-tls-clone-fix` — resolves this TODO - forgejo_admin/pal-e-docs issue for prometheus instrumentation (prerequisite for dashboard data)
forgejo_admin 2026-03-14 19:05:36 +00:00
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
forgejo_admin/pal-e-platform#55
No description provided.