feat: add ScheduledBackup CR for Woodpecker CNPG cluster #87
Labels
No labels
domain:backend
domain:devops
domain:frontend
status:approved
status:in-progress
status:needs-fix
status:qa
type:bug
type:devops
type:feature
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
forgejo_admin/pal-e-platform#87
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Lineage
plan-pal-e-platform→ Phase 17a → 17a-8 (Woodpecker CNPG backup)Repo
forgejo_admin/pal-e-platformUser Story
As a platform operator
I want daily CNPG backups for the Woodpecker database to MinIO
So that DB migrations use restore-from-backup instead of fresh creation, preserving the JWT signing key and all pipeline history
Context
pal-e-docs CNPG cluster has daily
ScheduledBackupto MinIO (3 completed, running for 13+ days). Woodpecker CNPG cluster has ZERO backups. This gap means any DB rebuild creates a fresh database, generating a new JWT signing key that invalidates all API tokens across 6 consumers. This has caused 4 manual token rotations.The fix: same pattern as pal-e-docs. A
ScheduledBackupCR in the woodpecker namespace, targeting thewoodpecker-dbCNPG cluster, storing to MinIO via barmanObjectStore.File Targets
Files the agent should modify:
terraform/main.tf— Add akubernetes_manifestresource for the WoodpeckerScheduledBackupCR. Pattern to follow: search forpal-e-postgres-dailyin the same file — that's the existing pal-e-docs backup. Create an identical resource targetingwoodpecker-dbcluster inwoodpeckernamespace, scheduled at0 0 3 * * *(03:00 daily, offset from pal-e-docs at 02:00).The CNPG
ScheduledBackupCR needs:spec.cluster.name: woodpecker-dbspec.method: barmanObjectStorespec.schedule: "0 0 3 * * *"spec.backupOwnerReference: clusterwoodpeckerThe S3 credentials secret
cnpg-s3-credsalready exists in the woodpecker namespace (confirmed:kubectl get secret cnpg-s3-creds -n woodpecker). The Woodpecker CNPG cluster CR already has barmanObjectStore configured (it was set up during the CNPG migration).Files the agent should NOT touch:
Acceptance Criteria
tofu validatepassestofu fmtproduces no changestofu plan -lock=falseshows 1 newkubernetes_manifestresource (ScheduledBackup)woodpecker-dbcluster inwoodpeckernamespaceTest Expectations
tofu validate— must passtofu plan -lock=false— 1 to add, 0 to change, 0 to destroykubectl get scheduledbackup -n woodpeckershows the new resourceConstraints
tofu fmtbefore committingtofu validatebefore committingtofu applytofu plan -lock=falseoutput in PR descriptionChecklist
tofu validatepassesRelated
phase-platform-17a-woodpecker-secrets— parent phasesop-postgres-restore— existing CNPG restore SOPpal-e-platform— projectPR #88 Review
DOMAIN REVIEW
Tech stack: Terraform (OpenTofu) / Kubernetes / CNPG
The PR adds a single
kubernetes_manifestresource defining a CNPGScheduledBackupCR for thewoodpecker-dbcluster. Verified against the codebase:Cluster name match -- The ScheduledBackup targets
woodpecker-db(line 1550), which matches the CNPG Cluster resource at line 1463. Correct.Namespace reference -- Uses
kubernetes_namespace_v1.woodpecker.metadata[0].name, consistent with all other woodpecker resources in the file (lines 704, 787, 1430, 1444, 1464).Schedule format --
0 0 3 * * *is a valid 6-field CNPG cron (seconds, minutes, hours, day, month, weekday). Evaluates to 03:00 UTC daily. CNPG uses the Go cron library which supports the seconds field. Correct.Backup method --
barmanObjectStorematches the backup configuration already defined in the Cluster spec (line 1498), which points tos3://postgres-wal/woodpecker/on MinIO. The ScheduledBackup will use the cluster's existing barman config. Correct.Owner reference --
backupOwnerReference = "cluster"means Backup objects are garbage-collected when the Cluster CR is deleted. This is the standard CNPG pattern.Dependency chain --
depends_on = [kubernetes_manifest.woodpecker_postgres]ensures the Cluster exists before the ScheduledBackup is applied. Correct.Verify job consistency -- The existing
cnpg-backup-verifyCronJob (line 2105) already checks thewoodpeckerprefix in its MinIO scan loop (line 2150). This PR completes the circuit by ensuring base backups actually get created for that prefix.tofu fmt / validate -- PR body confirms both passed. Diff shows consistent formatting (proper indentation, aligned
=signs).Note: This is the first
ScheduledBackupCR in the entire codebase. My task description mentioned "same pattern as existing pal-e-docs backup" but no pal-e-docs ScheduledBackup exists inpal-e-platform,pal-e-docs, orpal-e-services. The pal-e-docs CNPG Cluster has WAL archiving configured in its Cluster spec, but no explicit ScheduledBackup for base backups. This is not a blocker for this PR -- it means pal-e-docs may also need a ScheduledBackup CR (WAL archiving alone does not create base backups; without periodic base backups, PITR recovery requires replaying WALs from the initial base backup, which grows unbounded).BLOCKERS
None.
NITS
Schedule collision with verify job -- The new backup runs at 03:00 UTC (6-field:
0 0 3 * * *) and thecnpg-backup-verifyCronJob also runs at 03:00 UTC (5-field:0 3 * * *). The verify job checks for WAL files within the last 25 hours (continuous archiving), not base backup completion, so this is not a functional issue. However, offsetting the verify job to 04:00 would be cleaner -- it would verify after both the backup and any WAL archival triggered by the backup have settled. Low priority.Missing pal-e-docs ScheduledBackup -- As noted above, the pal-e-docs CNPG cluster (
/home/ldraney/pal-e-platform/terraform/main.tf) has WAL archiving but noScheduledBackupCR. This should be tracked as a follow-up issue -- base backups are essential for bounded PITR recovery. Not in scope for this PR.SOP COMPLIANCE
87-feat-add-scheduledbackup-cr-for-woodpeckreferences #87)plan-pal-e-platform)Closes #87in Related sectiontofu fmtandtofu validateconfirmed passing in Test Plantofu plan -lock=falseoutput included (1 to add, 1 cosmetic change)PROCESS OBSERVATIONS
kubectl get scheduledbackups, checking for completed backup objects after 03:00 UTC). Good operational hygiene.woodpecker_db_credentials(write-only attribute drift) is noted and explained in the PR body. Transparency appreciated.VERDICT: APPROVED