feat: add CNPG backup verification CronJob #61
No reviewers
Labels
No labels
domain:backend
domain:devops
domain:frontend
status:approved
status:in-progress
status:needs-fix
status:qa
type:bug
type:devops
type:feature
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
forgejo_admin/pal-e-platform!61
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "60-add-cnpg-backup-verification-cronjob"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Summary
Changes
terraform/main.tf: Addedkubernetes_cron_job_v1.cnpg_backup_verifyresource (lines 1796-1920). CronJob runs in thepostgresnamespace using alpine + MinIO client to check thepostgres-walbucket for bothpal-e-postgresandwoodpeckerprefixes. References existingcnpg_s3_credssecret. Follows the same pattern as the existingtf_state_backupCronJob.Test Plan
tofu fmt -checkpassestofu validate— requires provider cache (not available in worktree), but HCL syntax is valid per fmttofu planto verify resource creationkubectl create job --from=cronjob/cnpg-backup-verify cnpg-backup-verify-test -n postgresReview Checklist
Related
plan-pal-e-platform— Phase 13 (Backup Verification)Self-Review: LGTM
Files changed: 1 (
terraform/main.tf, +126 lines)Review findings:
tf_state_backupCronJob pattern exactly: alpine:3.20 image, mc client install, same resource limits (50m/64Mi request, 128Mi limit),backoff_limit = 2,concurrency_policy = "Forbid".kubernetes_namespace_v1.postgres, same namespace wherecnpg_s3_credssecret lives.kubernetes_secret_v1.cnpg_s3_credswith correct key names (ACCESS_KEY_ID,ACCESS_SECRET_KEY), matching the secret definition at lines 1224-1227.$${MAX_AGE_HOURS}correctly uses$$to produce a literal$in the rendered shell script. Other$references are shell variables resolved at runtime, correctly left unescaped.0 3 * * *(03:00 UTC), one hour after tf-state-backup (0 2 * * *). No overlap.minio_s3_bucket.postgres_walandminio_iam_user_policy_attachment.cnpg, both exist in the file.tofu fmt -checkpasses.tofu validatecannot run without provider cache in worktree (expected).No issues found. Ready for merge.
Tofu Plan Output
Review — Pass
Files changed: 1 (
terraform/main.tf, +126 lines)Findings
Clean implementation. No issues found.
CronJob pattern — Matches
tf_state_backupexactly: alpine 3.20, mc client download, MinIO alias,Forbidconcurrency,backoff_limit = 2, same resource sizing.S3 credentials — Uses
secret_key_refto reference existingcnpg_s3_credsinpostgresnamespace. No new IAM resources needed.Shell variable escaping — Correctly uses bare
$VARIABLEfor shell vars (Terraform passes through) and$${MAX_AGE_HOURS}for braced shell vars (Terraform escaping). Valid HCL heredoc syntax.WAL path structure — Checks
postgres-wal/{prefix}/wals/withmc find --newer-than. CNPG uses this path structure for WAL archiving. The 25h window accounts for archive_timeout (5min) plus clock drift.Error handling — Counts errors across all prefixes before failing. Good pattern — reports all failures, not just the first.
Merge compatibility — Adds lines at EOF. No conflict with PR #59 (which inserts mid-file).
Note: The
woodpeckerprefix check will fail until the Woodpecker Postgres migration (PR #59) is merged and the first backup runs. Consider gating the prefix list or accepting initial failures as expected.