Platform hardening: Woodpecker TLS fix, Trivy, dashboard, MinIO #56
No reviewers
Labels
No labels
domain:backend
domain:devops
domain:frontend
status:approved
status:in-progress
status:needs-fix
status:qa
type:bug
type:devops
type:feature
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
forgejo_admin/pal-e-platform!56
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "55-platform-hardening-woodpecker-tls-fix-tr"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Summary
Batch of 4 platform hardening changes to
terraform/main.tf: fix Woodpecker CI clone TLS failures by switching to in-cluster Forgejo URL, enable Harbor Trivy vulnerability scanning, add pal-e-docs Golden Signals Grafana dashboard, and addforce_destroyon all MinIO buckets.Changes
terraform/main.tf--WOODPECKER_FORGEJO_URLchanged from external Tailscale funnel URL (https://forgejo.tail5b443a.ts.net) to internal service URL (http://forgejo-http.forgejo.svc.cluster.local:80);WOODPECKER_FORGEJO_SKIP_VERIFYremoved (plain HTTP needs no TLS verification)terraform/main.tf-- Harbortrivy.enabledflipped totruewith resource limits (requests: 100m CPU / 256Mi,limits: 1Gi memory)terraform/main.tf-- Newkubernetes_config_map_v1.pal_e_docs_dashboardConfigMap withgrafana_dashboard = "1"label, following the existingdora_dashboardpatternterraform/dashboards/pal-e-docs-golden-signals.json-- New Grafana dashboard JSON with 6 panels: request rate, p50/p95/p99 latency, 5xx error rate, CPU usage vs limits, memory usage vs limits. Uses${DS_PROMETHEUS}datasource variable.terraform/main.tf--force_destroy = trueadded to all 3 MinIO buckets (assets,postgres-wal,tf-state-backups)ACCESS_KEY_ID/ACCESS_SECRET_KEY(required by BarmanObjectStoreConfiguration CRD), tf-backup secret usesAWS_ACCESS_KEY_ID/AWS_SECRET_ACCESS_KEY(standard env vars consumed by mc CLI). Renaming either would break its consumer.Test Plan
tofu fmt -check-- no formatting drifttofu validate-- all resources validtofu plan-- should show: Woodpecker Helm update, Harbor Helm update, new pal-e-docs-dashboard ConfigMap, 3 MinIO bucket updates (force_destroy)Review Checklist
tofu fmtpassestofu validatepassestofu planoutput reviewed before mergeprometheus-fastapi-instrumentatoris deployed on pal-e-docs (separate prerequisite)Related
plan-pal-e-platformtodo-woodpecker-tls-clone-fixCloses #55
Self-review: all 4 changes verified.
${DS_PROMETHEUS}datasource variable.tofu fmt.ACCESS_SECRET_KEYkey name. Changing would break WAL archival. Left as-is with rationale documented.tofu validateandtofu fmt -checkboth pass. Ready fortofu planreview before merge.Tofu Plan Output
PR #56 Review
BLOCKERS
1. Error percentage panel has unit mismatch (dashboard JSON)
In
terraform/dashboards/pal-e-docs-golden-signals.json, the "5xx Error Rate" panel (id 6) has two targets on the same panel:refId: A-- absolute 5xx errors/s (correctly usesreqpsunit)refId: B-- 5xx error percentage (multiplies by 100 to get percent)Both share the panel's
"unit": "reqps"setting, so the percentage series will display as "X req/s" when it is actually "X %". This will confuse operators reading the dashboard.Fix options:
"unit": "percent")"percentunit"(0-1 scale, remove the* 100) or"percent"(0-100 scale, keep the* 100)2.
tofu planoutput not includedPer
CLAUDE.mdPR conventions: "Includetofu planoutput for any Terraform changes." The test plan shows this as unchecked. The plan output should be reviewed before merge to confirm expected resource changes (Woodpecker Helm update, Harbor Helm update, new ConfigMap, 3 bucket updates).NITS
1. Truncated branch name
Branch
55-platform-hardening-woodpecker-tls-fix-trappears truncated (ends in-tr). Not blocking, but worth noting for traceability.2. Dashboard prerequisite dependency not enforced
The PR body notes that the dashboard "assumes
prometheus-fastapi-instrumentatoris deployed on pal-e-docs." If the instrumentator is not yet scraped by Prometheus (e.g., no ServiceMonitor/PodMonitor targeting pal-e-docs), the dashboard will show "No data" for all panels. Consider adding a comment in the ConfigMap or dashboard description noting this prerequisite. This is not blocking since the dashboard simply shows empty panels if metrics are absent.3.
force_destroy = trueonpostgres-walandtf-state-backupsbucketsforce_destroy = trueis appropriate for dev/single-operator environments, but these two buckets contain critical data:postgres-wal-- WAL archives for database point-in-time recoverytf-state-backups-- Terraform state backup CronJob outputA future
tofu destroyor bucket recreation will silently delete all objects. This is acceptable given the current single-node context, but worth documenting the risk. Not blocking.SOP COMPLIANCE
55-platform-hardening-...references issue #55)plan-pal-e-platform)Closes #55presentdora_dashboardpattern exactlyFORGEJO_URLin CI runner config)tofu planoutput not included (required by CLAUDE.md PR conventions)VERDICT: NOT APPROVED
Two blockers:
tofu planoutput must be included or reviewed before merge per repo conventions.Tofu Plan Output
QA Review-Fix Loop
Blocker 1: Dashboard error panel unit mismatch — FIXED
Added Grafana field override on "5xx error %" series to use
percentunit with right Y-axis. Commitda164cb.Blocker 2:
tofu planoutputkubernetes_config_map_v1.pal_e_docs_dashboardhelm_release.harbortrivy.enabled = true+ resource limitshelm_release.woodpeckerWOODPECKER_FORGEJO_URL→ internal svc,SKIP_VERIFYremovedhelm_release.kube_prometheus_stackminio_s3_bucket.assetsforce_destroy = trueminio_s3_bucket.postgres_walforce_destroy = trueminio_s3_bucket.tf_state_backupsforce_destroy = trueNo resources destroyed. All changes are expected and match the issue scope.
Tofu Plan Output
Tofu Plan Output
Tofu Plan Output