feat: add ScheduledBackup CR for Woodpecker CNPG cluster #87

Closed
opened 2026-03-16 01:26:45 +00:00 by forgejo_admin · 1 comment

Lineage

plan-pal-e-platform → Phase 17a → 17a-8 (Woodpecker CNPG backup)

Repo

forgejo_admin/pal-e-platform

User Story

As a platform operator
I want daily CNPG backups for the Woodpecker database to MinIO
So that DB migrations use restore-from-backup instead of fresh creation, preserving the JWT signing key and all pipeline history

Context

pal-e-docs CNPG cluster has daily ScheduledBackup to MinIO (3 completed, running for 13+ days). Woodpecker CNPG cluster has ZERO backups. This gap means any DB rebuild creates a fresh database, generating a new JWT signing key that invalidates all API tokens across 6 consumers. This has caused 4 manual token rotations.

The fix: same pattern as pal-e-docs. A ScheduledBackup CR in the woodpecker namespace, targeting the woodpecker-db CNPG cluster, storing to MinIO via barmanObjectStore.

File Targets

Files the agent should modify:

  • terraform/main.tf — Add a kubernetes_manifest resource for the Woodpecker ScheduledBackup CR. Pattern to follow: search for pal-e-postgres-daily in the same file — that's the existing pal-e-docs backup. Create an identical resource targeting woodpecker-db cluster in woodpecker namespace, scheduled at 0 0 3 * * * (03:00 daily, offset from pal-e-docs at 02:00).

The CNPG ScheduledBackup CR needs:

  • spec.cluster.name: woodpecker-db
  • spec.method: barmanObjectStore
  • spec.schedule: "0 0 3 * * *"
  • spec.backupOwnerReference: cluster
  • Namespace: woodpecker

The S3 credentials secret cnpg-s3-creds already exists in the woodpecker namespace (confirmed: kubectl get secret cnpg-s3-creds -n woodpecker). The Woodpecker CNPG cluster CR already has barmanObjectStore configured (it was set up during the CNPG migration).

Files the agent should NOT touch:

  • The existing pal-e-docs ScheduledBackup
  • Any other terraform resources
  • The Woodpecker helm release

Acceptance Criteria

  • tofu validate passes
  • tofu fmt produces no changes
  • tofu plan -lock=false shows 1 new kubernetes_manifest resource (ScheduledBackup)
  • The ScheduledBackup targets woodpecker-db cluster in woodpecker namespace
  • Schedule is daily at 03:00 UTC

Test Expectations

  • tofu validate — must pass
  • tofu plan -lock=false — 1 to add, 0 to change, 0 to destroy
  • After apply: kubectl get scheduledbackup -n woodpecker shows the new resource

Constraints

  • Must run tofu fmt before committing
  • Must run tofu validate before committing
  • Do NOT run tofu apply
  • Include tofu plan -lock=false output in PR description
  • Follow the EXACT pattern of the existing pal-e-docs ScheduledBackup resource

Checklist

  • PR opened
  • tofu validate passes
  • No unrelated changes
  • phase-platform-17a-woodpecker-secrets — parent phase
  • sop-postgres-restore — existing CNPG restore SOP
  • pal-e-platform — project
### Lineage `plan-pal-e-platform` → Phase 17a → 17a-8 (Woodpecker CNPG backup) ### Repo `forgejo_admin/pal-e-platform` ### User Story As a platform operator I want daily CNPG backups for the Woodpecker database to MinIO So that DB migrations use restore-from-backup instead of fresh creation, preserving the JWT signing key and all pipeline history ### Context pal-e-docs CNPG cluster has daily `ScheduledBackup` to MinIO (3 completed, running for 13+ days). Woodpecker CNPG cluster has ZERO backups. This gap means any DB rebuild creates a fresh database, generating a new JWT signing key that invalidates all API tokens across 6 consumers. This has caused 4 manual token rotations. The fix: same pattern as pal-e-docs. A `ScheduledBackup` CR in the woodpecker namespace, targeting the `woodpecker-db` CNPG cluster, storing to MinIO via barmanObjectStore. ### File Targets Files the agent should modify: - `terraform/main.tf` — Add a `kubernetes_manifest` resource for the Woodpecker `ScheduledBackup` CR. Pattern to follow: search for `pal-e-postgres-daily` in the same file — that's the existing pal-e-docs backup. Create an identical resource targeting `woodpecker-db` cluster in `woodpecker` namespace, scheduled at `0 0 3 * * *` (03:00 daily, offset from pal-e-docs at 02:00). The CNPG `ScheduledBackup` CR needs: - `spec.cluster.name: woodpecker-db` - `spec.method: barmanObjectStore` - `spec.schedule: "0 0 3 * * *"` - `spec.backupOwnerReference: cluster` - Namespace: `woodpecker` The S3 credentials secret `cnpg-s3-creds` already exists in the woodpecker namespace (confirmed: `kubectl get secret cnpg-s3-creds -n woodpecker`). The Woodpecker CNPG cluster CR already has barmanObjectStore configured (it was set up during the CNPG migration). Files the agent should NOT touch: - The existing pal-e-docs ScheduledBackup - Any other terraform resources - The Woodpecker helm release ### Acceptance Criteria - [ ] `tofu validate` passes - [ ] `tofu fmt` produces no changes - [ ] `tofu plan -lock=false` shows 1 new `kubernetes_manifest` resource (ScheduledBackup) - [ ] The ScheduledBackup targets `woodpecker-db` cluster in `woodpecker` namespace - [ ] Schedule is daily at 03:00 UTC ### Test Expectations - [ ] `tofu validate` — must pass - [ ] `tofu plan -lock=false` — 1 to add, 0 to change, 0 to destroy - [ ] After apply: `kubectl get scheduledbackup -n woodpecker` shows the new resource ### Constraints - Must run `tofu fmt` before committing - Must run `tofu validate` before committing - Do NOT run `tofu apply` - Include `tofu plan -lock=false` output in PR description - Follow the EXACT pattern of the existing pal-e-docs ScheduledBackup resource ### Checklist - [ ] PR opened - [ ] `tofu validate` passes - [ ] No unrelated changes ### Related - `phase-platform-17a-woodpecker-secrets` — parent phase - `sop-postgres-restore` — existing CNPG restore SOP - `pal-e-platform` — project
Author
Owner

PR #88 Review

DOMAIN REVIEW

Tech stack: Terraform (OpenTofu) / Kubernetes / CNPG

The PR adds a single kubernetes_manifest resource defining a CNPG ScheduledBackup CR for the woodpecker-db cluster. Verified against the codebase:

  1. Cluster name match -- The ScheduledBackup targets woodpecker-db (line 1550), which matches the CNPG Cluster resource at line 1463. Correct.

  2. Namespace reference -- Uses kubernetes_namespace_v1.woodpecker.metadata[0].name, consistent with all other woodpecker resources in the file (lines 704, 787, 1430, 1444, 1464).

  3. Schedule format -- 0 0 3 * * * is a valid 6-field CNPG cron (seconds, minutes, hours, day, month, weekday). Evaluates to 03:00 UTC daily. CNPG uses the Go cron library which supports the seconds field. Correct.

  4. Backup method -- barmanObjectStore matches the backup configuration already defined in the Cluster spec (line 1498), which points to s3://postgres-wal/woodpecker/ on MinIO. The ScheduledBackup will use the cluster's existing barman config. Correct.

  5. Owner reference -- backupOwnerReference = "cluster" means Backup objects are garbage-collected when the Cluster CR is deleted. This is the standard CNPG pattern.

  6. Dependency chain -- depends_on = [kubernetes_manifest.woodpecker_postgres] ensures the Cluster exists before the ScheduledBackup is applied. Correct.

  7. Verify job consistency -- The existing cnpg-backup-verify CronJob (line 2105) already checks the woodpecker prefix in its MinIO scan loop (line 2150). This PR completes the circuit by ensuring base backups actually get created for that prefix.

  8. tofu fmt / validate -- PR body confirms both passed. Diff shows consistent formatting (proper indentation, aligned = signs).

Note: This is the first ScheduledBackup CR in the entire codebase. My task description mentioned "same pattern as existing pal-e-docs backup" but no pal-e-docs ScheduledBackup exists in pal-e-platform, pal-e-docs, or pal-e-services. The pal-e-docs CNPG Cluster has WAL archiving configured in its Cluster spec, but no explicit ScheduledBackup for base backups. This is not a blocker for this PR -- it means pal-e-docs may also need a ScheduledBackup CR (WAL archiving alone does not create base backups; without periodic base backups, PITR recovery requires replaying WALs from the initial base backup, which grows unbounded).

BLOCKERS

None.

NITS

  1. Schedule collision with verify job -- The new backup runs at 03:00 UTC (6-field: 0 0 3 * * *) and the cnpg-backup-verify CronJob also runs at 03:00 UTC (5-field: 0 3 * * *). The verify job checks for WAL files within the last 25 hours (continuous archiving), not base backup completion, so this is not a functional issue. However, offsetting the verify job to 04:00 would be cleaner -- it would verify after both the backup and any WAL archival triggered by the backup have settled. Low priority.

  2. Missing pal-e-docs ScheduledBackup -- As noted above, the pal-e-docs CNPG cluster (/home/ldraney/pal-e-platform/terraform/main.tf) has WAL archiving but no ScheduledBackup CR. This should be tracked as a follow-up issue -- base backups are essential for bounded PITR recovery. Not in scope for this PR.

SOP COMPLIANCE

  • Branch named after issue (87-feat-add-scheduledbackup-cr-for-woodpeck references #87)
  • PR body follows template (Summary, Changes, tofu plan Output, Test Plan, Review Checklist, Related)
  • Related references plan slug (plan-pal-e-platform)
  • Closes #87 in Related section
  • No secrets committed
  • No unnecessary file changes (1 file, 28 additions, 0 deletions -- all on-topic)
  • Commit messages are descriptive
  • tofu fmt and tofu validate confirmed passing in Test Plan
  • tofu plan -lock=false output included (1 to add, 1 cosmetic change)

PROCESS OBSERVATIONS

  • Clean, minimal PR. Single resource addition with proper dependency chain and consistent naming patterns.
  • Test plan includes post-apply verification steps (kubectl get scheduledbackups, checking for completed backup objects after 03:00 UTC). Good operational hygiene.
  • The cosmetic "1 to change" on woodpecker_db_credentials (write-only attribute drift) is noted and explained in the PR body. Transparency appreciated.

VERDICT: APPROVED

## PR #88 Review ### DOMAIN REVIEW **Tech stack:** Terraform (OpenTofu) / Kubernetes / CNPG The PR adds a single `kubernetes_manifest` resource defining a CNPG `ScheduledBackup` CR for the `woodpecker-db` cluster. Verified against the codebase: 1. **Cluster name match** -- The ScheduledBackup targets `woodpecker-db` (line 1550), which matches the CNPG Cluster resource at line 1463. Correct. 2. **Namespace reference** -- Uses `kubernetes_namespace_v1.woodpecker.metadata[0].name`, consistent with all other woodpecker resources in the file (lines 704, 787, 1430, 1444, 1464). 3. **Schedule format** -- `0 0 3 * * *` is a valid 6-field CNPG cron (seconds, minutes, hours, day, month, weekday). Evaluates to 03:00 UTC daily. CNPG uses the Go cron library which supports the seconds field. Correct. 4. **Backup method** -- `barmanObjectStore` matches the backup configuration already defined in the Cluster spec (line 1498), which points to `s3://postgres-wal/woodpecker/` on MinIO. The ScheduledBackup will use the cluster's existing barman config. Correct. 5. **Owner reference** -- `backupOwnerReference = "cluster"` means Backup objects are garbage-collected when the Cluster CR is deleted. This is the standard CNPG pattern. 6. **Dependency chain** -- `depends_on = [kubernetes_manifest.woodpecker_postgres]` ensures the Cluster exists before the ScheduledBackup is applied. Correct. 7. **Verify job consistency** -- The existing `cnpg-backup-verify` CronJob (line 2105) already checks the `woodpecker` prefix in its MinIO scan loop (line 2150). This PR completes the circuit by ensuring base backups actually get created for that prefix. 8. **tofu fmt / validate** -- PR body confirms both passed. Diff shows consistent formatting (proper indentation, aligned `=` signs). **Note:** This is the first `ScheduledBackup` CR in the entire codebase. My task description mentioned "same pattern as existing pal-e-docs backup" but no pal-e-docs ScheduledBackup exists in `pal-e-platform`, `pal-e-docs`, or `pal-e-services`. The pal-e-docs CNPG Cluster has WAL archiving configured in its Cluster spec, but no explicit ScheduledBackup for base backups. This is not a blocker for this PR -- it means pal-e-docs may also need a ScheduledBackup CR (WAL archiving alone does not create base backups; without periodic base backups, PITR recovery requires replaying WALs from the initial base backup, which grows unbounded). ### BLOCKERS None. ### NITS 1. **Schedule collision with verify job** -- The new backup runs at 03:00 UTC (6-field: `0 0 3 * * *`) and the `cnpg-backup-verify` CronJob also runs at 03:00 UTC (5-field: `0 3 * * *`). The verify job checks for WAL files within the last 25 hours (continuous archiving), not base backup completion, so this is not a functional issue. However, offsetting the verify job to 04:00 would be cleaner -- it would verify after both the backup and any WAL archival triggered by the backup have settled. Low priority. 2. **Missing pal-e-docs ScheduledBackup** -- As noted above, the pal-e-docs CNPG cluster (`/home/ldraney/pal-e-platform/terraform/main.tf`) has WAL archiving but no `ScheduledBackup` CR. This should be tracked as a follow-up issue -- base backups are essential for bounded PITR recovery. Not in scope for this PR. ### SOP COMPLIANCE - [x] Branch named after issue (`87-feat-add-scheduledbackup-cr-for-woodpeck` references #87) - [x] PR body follows template (Summary, Changes, tofu plan Output, Test Plan, Review Checklist, Related) - [x] Related references plan slug (`plan-pal-e-platform`) - [x] `Closes #87` in Related section - [x] No secrets committed - [x] No unnecessary file changes (1 file, 28 additions, 0 deletions -- all on-topic) - [x] Commit messages are descriptive - [x] `tofu fmt` and `tofu validate` confirmed passing in Test Plan - [x] `tofu plan -lock=false` output included (1 to add, 1 cosmetic change) ### PROCESS OBSERVATIONS - Clean, minimal PR. Single resource addition with proper dependency chain and consistent naming patterns. - Test plan includes post-apply verification steps (`kubectl get scheduledbackups`, checking for completed backup objects after 03:00 UTC). Good operational hygiene. - The cosmetic "1 to change" on `woodpecker_db_credentials` (write-only attribute drift) is noted and explained in the PR body. Transparency appreciated. ### VERDICT: APPROVED
forgejo_admin 2026-03-16 01:33:03 +00:00
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
forgejo_admin/pal-e-platform#87
No description provided.