Platform hardening: Woodpecker TLS fix, Trivy, dashboard, QA nits #55
Labels
No labels
domain:backend
domain:devops
domain:frontend
status:approved
status:in-progress
status:needs-fix
status:qa
type:bug
type:devops
type:feature
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
forgejo_admin/pal-e-platform#55
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Lineage
plan-pal-e-platform→ Phase 4 (Dashboard) + Phase 10 (Vuln Scanning) + QA nits from PRs #35/#39todo-woodpecker-tls-clone-fix(Priority 1)Repo
forgejo_admin/pal-e-platformUser Story
As a platform operator
I want reliable CI clones, vulnerability scanning, golden signals dashboards, and cleaned up tech debt
So that the platform is hardened for team onboarding
Context
Four platform hardening items batched into one PR since they all touch
terraform/main.tf:P1 - Woodpecker TLS Clone Fix: Every Woodpecker pipeline clone uses the external Tailscale funnel URL (
https://forgejo.tail5b443a.ts.net), causing intermittent TLS EOF failures. Fix: use internal service URLhttp://forgejo-http.forgejo.svc.cluster.local:80. Also removeWOODPECKER_FORGEJO_SKIP_VERIFYsince plain HTTP doesn't need TLS verification.P2 - Golden Signals Dashboard: Create a Grafana dashboard for pal-e-docs showing request rate, latency percentiles (p50/p95/p99), 5xx error rate, and CPU/memory vs limits. Follow existing pattern from DORA dashboard ConfigMap.
P3 - Harbor Trivy: Enable Trivy vulnerability scanning in Harbor. Currently
trivy = { enabled = false }at line 696. Add resource limits for the scanner.P4 - QA Nits:
#alertsSlack channel is hardcoded but Slack receiver is already conditional (only active whenslack_webhook_urlis set) — this is actually fine as-is. Main nits: addforce_destroy = trueon MinIO buckets to preventtofu destroyfailures, and fix credential key naming consistency between CNPG and TF backup MinIO secrets.File Targets
Files to modify:
terraform/main.tf— Woodpecker URL (line 575), Trivy toggle (line 696-698), MinIO bucketforce_destroy, MinIO secret key naming, new dashboard ConfigMapterraform/dashboards/pal-e-docs-golden-signals.json— new file, Grafana dashboard JSONFiles NOT to touch:
terraform/providers.tf— Helm provider caching nit from PR #38 is not worth the risk of provider behavior changes.woodpecker.yaml— Trivy scan step is optional/deferredAcceptance Criteria
WOODPECKER_FORGEJO_URLuseshttp://forgejo-http.forgejo.svc.cluster.local:80WOODPECKER_FORGEJO_SKIP_VERIFYis removedtrivy = { enabled = true }with resource limits in Harbor Helm valuesassets,postgres-wal,tf-state-backups) haveforce_destroy = trueAWS_ACCESS_KEY_ID/AWS_SECRET_ACCESS_KEYpattern, or both useACCESS_KEY_ID/ACCESS_SECRET_KEY— pick one and align)pal-e-docs-dashboardwithgrafana_dashboard = "1"label loadspal-e-docs-golden-signals.jsontofu validatepassestofu fmtpassesTest Expectations
tofu validate— all resources validtofu fmt -check— no formatting drifttofu plan— shows expected changes (Woodpecker Helm update, Harbor Helm update, new ConfigMap, MinIO bucket updates)cd terraform && tofu validate && tofu fmt -checkConstraints
dora_dashboardresource (lines 1116-1130)${DS_PROMETHEUS}datasource variable (same as DORA dashboard)prometheus-fastapi-instrumentatoris deployed on pal-e-docs (separate PR)requests: { cpu: "100m", memory: "256Mi" }, limits: { memory: "1Gi" }(Trivy needs memory for vuln DB)Checklist
tofu validatepassestofu fmtpassesRelated
plan-pal-e-platform— parent plantodo-woodpecker-tls-clone-fix— resolves this TODO