fix: backup verify CronJob fails on new CNPG clusters without WAL archives #92
Labels
No labels
domain:backend
domain:devops
domain:frontend
status:approved
status:in-progress
status:needs-fix
status:qa
type:bug
type:devops
type:feature
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
forgejo_admin/pal-e-platform#92
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Lineage
todo-cnpg-backup-verify-failure(no plan ancestry)Repo
forgejo_admin/pal-e-platformUser Story
As a platform operator
I want the backup verification CronJob to handle new CNPG clusters that haven't archived WAL segments yet
So that the KubeJobFailed alert only fires on real backup problems, not on expected new-cluster behavior
Context
The
cnpg-backup-verifyCronJob checks for recent WAL files in MinIO for bothpal-e-postgresandwoodpeckerprefixes. The woodpecker CNPG cluster is 2 days old and hasn't archived any WAL segments yet — CNPG only archives WALs when they fill up (16MB default). Low-traffic databases can take days to produce their first WAL archive.The verify script treats an empty WAL directory as a failure, causing a KubeJobFailed warning alert even though base backups are completing successfully (8/8 for pal-e-postgres, 1/1 for woodpecker).
Verified via MinIO:
backup/postgres-wal/pal-e-postgres/wals/has 3 WAL segment directories.backup/postgres-wal/woodpecker/wals/is empty.File Targets
Files to modify:
terraform/main.tf—kubernetes_cron_job_v1.cnpg_backup_verifyscript block (~line 2295). Add WAL directory existence check before the freshness check.Files NOT to touch:
Acceptance Criteria
SKIP: No WAL directory yetand continues without errortofu planshows only the CronJob resource changingTest Expectations
kubectl delete job cnpg-backup-verify-29560860 -n postgres— already done)tofu plan -lock=falseto verify only CronJob changesConstraints
$${VAR}syntax for shell variables inside terraform heredoc (Woodpecker variable syntax)Checklist
tofu planshows only CronJob changeRelated
pal-e-platform— projecttodo-cnpg-backup-verify-failure— pal-e-docs TODO tracking thisdeployment-lessons— lessons learned docReading issue for QA review context.