Add TF state backup CronJob to MinIO #39
No reviewers
Labels
No labels
domain:backend
domain:devops
domain:frontend
status:approved
status:in-progress
status:needs-fix
status:qa
type:bug
type:devops
type:feature
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
forgejo_admin/pal-e-platform!39
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "36-tf-state-backup-cronjob"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Summary
Adds automated daily backup of Terraform state secrets from the
tofu-statenamespace to a MinIO bucket (tf-state-backups). This is the safety net that makes CI-driventofu applysafe to deploy -- backup first, automate second.Changes
terraform/main.tf-- Added all resources under a# --- TF State Backup ---section:data.kubernetes_namespace_v1.tofu_state-- references existingtofu-statenamespaceminio_s3_bucket.tf_state_backups-- createstf-state-backupsbucketminio_iam_user.tf_backup+minio_iam_policy.tf_backup+ policy attachment -- MinIO IAM with scoped S3 permissions (GetObject, PutObject, DeleteObject, ListBucket)kubernetes_secret_v1.tf_backup_s3_creds-- MinIO credentials intofu-statenamespacekubernetes_service_account_v1.tf_backup-- dedicated ServiceAccount for the CronJobkubernetes_role_v1.tf_backup+kubernetes_role_binding_v1.tf_backup-- RBAC scoped to onlytfstate-default-pal-e-platformandtfstate-default-pal-e-servicessecrets (get verb only)kubernetes_cron_job_v1.tf_state_backup-- runs daily at 02:00 UTC, reads each state secret, base64-decodes the tfstate key, uploads to MinIO as{secret-name}-{date}.json, prunes backups older than 30 daysTest Plan
tofu fmtapplied, no changes neededtofu validatepassestofu planshows expected new resources (9 total: bucket, IAM user, IAM policy, policy attachment, k8s secret, service account, role, role binding, cron job)kubectl create job --from=cronjob/tf-state-backup test-backup -n tofu-statemc ls minio/tf-state-backups/to verify backup files existReview Checklist
minio_iam_user.cnpg,minio_s3_bucket.postgres_wal)Related
plan-pal-e-platform-- Phase 6.1 (State Backup CronJob)PR #39 Review
BLOCKERS
None.
NITS
Runtime
mcdownload is an external dependency. The CronJob downloads the MinIO client binary fromdl.min.ioon every execution. If that CDN is unreachable (network issues, air-gapped future, rate limits), the backup silently fails after the curl step. Consider building a small custom image withkubectl+mcpre-baked, or at minimum caching the binary to a PVC. Not blocking because theset -euo pipefailwill cause the job to fail visibly (non-zero exit) and thefailedJobsHistoryLimit = 3provides audit trail.MinIO credential key naming diverges from CNPG pattern. The existing
cnpg_s3_credssecret usesACCESS_KEY_ID/ACCESS_SECRET_KEY, while this newtf_backup_s3_credsusesAWS_ACCESS_KEY_ID/AWS_SECRET_ACCESS_KEY. This is correct for themcclient (which readsAWS_*env vars), so it is intentional and functional -- but worth noting the inconsistency for future reference. The CNPG secret keys are consumed by the CNPG operator which expects its own naming. No action needed.bitnami/kubectl:1.31minor version pin. Good practice, but will need updating when the cluster moves past 1.31.x. Consider a tracking issue or a comment in the code.No
force_destroyon the MinIO bucket. Consistent with the existingminio_s3_bucket.assetsandminio_s3_bucket.postgres_walpatterns. Just noting that if this bucket ever needs to be removed via Terraform, it will require manual emptying first (same lesson learned from the Litestream cleanup in PR #31).SOP COMPLIANCE
36-tf-state-backup-cronjobreferences #36)plan-pal-e-platform-- Phase 6.1)Closes #36in PR bodyterraform/main.tf, 198 additions)RBAC verification:
resource_names = ["tfstate-default-pal-e-platform", "tfstate-default-pal-e-services"]-- NOT blanket secret access["get"]onlytofu-statenamespaceScript logic verification:
kubectl get secret.data.tfstateand base64-decodesset -euo pipefailfor fail-fast behaviordepends_on chain verification:
helm_release.minioVERDICT: APPROVED