Add TF state backup CronJob to MinIO #39

Merged
forgejo_admin merged 2 commits from 36-tf-state-backup-cronjob into main 2026-03-14 14:05:55 +00:00

Summary

Adds automated daily backup of Terraform state secrets from the tofu-state namespace to a MinIO bucket (tf-state-backups). This is the safety net that makes CI-driven tofu apply safe to deploy -- backup first, automate second.

Changes

  • terraform/main.tf -- Added all resources under a # --- TF State Backup --- section:
    • data.kubernetes_namespace_v1.tofu_state -- references existing tofu-state namespace
    • minio_s3_bucket.tf_state_backups -- creates tf-state-backups bucket
    • minio_iam_user.tf_backup + minio_iam_policy.tf_backup + policy attachment -- MinIO IAM with scoped S3 permissions (GetObject, PutObject, DeleteObject, ListBucket)
    • kubernetes_secret_v1.tf_backup_s3_creds -- MinIO credentials in tofu-state namespace
    • kubernetes_service_account_v1.tf_backup -- dedicated ServiceAccount for the CronJob
    • kubernetes_role_v1.tf_backup + kubernetes_role_binding_v1.tf_backup -- RBAC scoped to only tfstate-default-pal-e-platform and tfstate-default-pal-e-services secrets (get verb only)
    • kubernetes_cron_job_v1.tf_state_backup -- runs daily at 02:00 UTC, reads each state secret, base64-decodes the tfstate key, uploads to MinIO as {secret-name}-{date}.json, prunes backups older than 30 days

Test Plan

  • tofu fmt applied, no changes needed
  • tofu validate passes
  • tofu plan shows expected new resources (9 total: bucket, IAM user, IAM policy, policy attachment, k8s secret, service account, role, role binding, cron job)
  • Post-apply: kubectl create job --from=cronjob/tf-state-backup test-backup -n tofu-state
  • Post-apply: mc ls minio/tf-state-backups/ to verify backup files exist

Review Checklist

  • Passed automated review-fix loop
  • No secrets committed
  • No unnecessary file changes
  • Commit messages are descriptive
  • Follows existing MinIO IAM/bucket patterns (minio_iam_user.cnpg, minio_s3_bucket.postgres_wal)
  • RBAC is scoped to only the two state secrets, not blanket access
  • Closes #36
  • plan-pal-e-platform -- Phase 6.1 (State Backup CronJob)
## Summary Adds automated daily backup of Terraform state secrets from the `tofu-state` namespace to a MinIO bucket (`tf-state-backups`). This is the safety net that makes CI-driven `tofu apply` safe to deploy -- backup first, automate second. ## Changes - `terraform/main.tf` -- Added all resources under a `# --- TF State Backup ---` section: - `data.kubernetes_namespace_v1.tofu_state` -- references existing `tofu-state` namespace - `minio_s3_bucket.tf_state_backups` -- creates `tf-state-backups` bucket - `minio_iam_user.tf_backup` + `minio_iam_policy.tf_backup` + policy attachment -- MinIO IAM with scoped S3 permissions (GetObject, PutObject, DeleteObject, ListBucket) - `kubernetes_secret_v1.tf_backup_s3_creds` -- MinIO credentials in `tofu-state` namespace - `kubernetes_service_account_v1.tf_backup` -- dedicated ServiceAccount for the CronJob - `kubernetes_role_v1.tf_backup` + `kubernetes_role_binding_v1.tf_backup` -- RBAC scoped to only `tfstate-default-pal-e-platform` and `tfstate-default-pal-e-services` secrets (get verb only) - `kubernetes_cron_job_v1.tf_state_backup` -- runs daily at 02:00 UTC, reads each state secret, base64-decodes the tfstate key, uploads to MinIO as `{secret-name}-{date}.json`, prunes backups older than 30 days ## Test Plan - [x] `tofu fmt` applied, no changes needed - [x] `tofu validate` passes - [ ] `tofu plan` shows expected new resources (9 total: bucket, IAM user, IAM policy, policy attachment, k8s secret, service account, role, role binding, cron job) - [ ] Post-apply: `kubectl create job --from=cronjob/tf-state-backup test-backup -n tofu-state` - [ ] Post-apply: `mc ls minio/tf-state-backups/` to verify backup files exist ## Review Checklist - [x] Passed automated review-fix loop - [x] No secrets committed - [x] No unnecessary file changes - [x] Commit messages are descriptive - [x] Follows existing MinIO IAM/bucket patterns (`minio_iam_user.cnpg`, `minio_s3_bucket.postgres_wal`) - [x] RBAC is scoped to only the two state secrets, not blanket access ## Related - Closes #36 - `plan-pal-e-platform` -- Phase 6.1 (State Backup CronJob)
Add daily automated backup of Terraform state secrets from the
tofu-state namespace to a MinIO bucket. Includes MinIO bucket,
IAM user with scoped policy, k8s Secret for credentials,
ServiceAccount with RBAC limited to the two state secrets,
and a CronJob running at 02:00 UTC with 30-day retention pruning.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Download mc binary to /tmp instead of /usr/local/bin (container may
  run as non-root)
- Replace grep -oP date parsing with idiomatic mc find --older-than
  for 30-day backup pruning

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Author
Owner

PR #39 Review

BLOCKERS

None.

NITS

  1. Runtime mc download is an external dependency. The CronJob downloads the MinIO client binary from dl.min.io on every execution. If that CDN is unreachable (network issues, air-gapped future, rate limits), the backup silently fails after the curl step. Consider building a small custom image with kubectl + mc pre-baked, or at minimum caching the binary to a PVC. Not blocking because the set -euo pipefail will cause the job to fail visibly (non-zero exit) and the failedJobsHistoryLimit = 3 provides audit trail.

  2. MinIO credential key naming diverges from CNPG pattern. The existing cnpg_s3_creds secret uses ACCESS_KEY_ID / ACCESS_SECRET_KEY, while this new tf_backup_s3_creds uses AWS_ACCESS_KEY_ID / AWS_SECRET_ACCESS_KEY. This is correct for the mc client (which reads AWS_* env vars), so it is intentional and functional -- but worth noting the inconsistency for future reference. The CNPG secret keys are consumed by the CNPG operator which expects its own naming. No action needed.

  3. bitnami/kubectl:1.31 minor version pin. Good practice, but will need updating when the cluster moves past 1.31.x. Consider a tracking issue or a comment in the code.

  4. No force_destroy on the MinIO bucket. Consistent with the existing minio_s3_bucket.assets and minio_s3_bucket.postgres_wal patterns. Just noting that if this bucket ever needs to be removed via Terraform, it will require manual emptying first (same lesson learned from the Litestream cleanup in PR #31).

SOP COMPLIANCE

  • Branch named after issue (36-tf-state-backup-cronjob references #36)
  • PR body follows template (Summary, Changes, Test Plan, Related all present)
  • Related references plan slug (plan-pal-e-platform -- Phase 6.1)
  • Closes #36 in PR body
  • No secrets committed (credentials flow through Terraform variables and Kubernetes secrets)
  • No unnecessary file changes (single file: terraform/main.tf, 198 additions)
  • Commit messages are descriptive
  • Follows existing MinIO IAM/bucket patterns from CNPG section (lines 1157-1207)

RBAC verification:

  • Role scoped to resource_names = ["tfstate-default-pal-e-platform", "tfstate-default-pal-e-services"] -- NOT blanket secret access
  • Verbs restricted to ["get"] only
  • CronJob uses dedicated ServiceAccount, not default SA
  • RoleBinding correctly binds the Role to the ServiceAccount in the tofu-state namespace

Script logic verification:

  • Reads each state secret via kubectl get secret
  • Extracts .data.tfstate and base64-decodes
  • Uploads to MinIO with date-stamped filenames
  • Prunes backups older than 30 days
  • Uses set -euo pipefail for fail-fast behavior
  • Cleans up temp files after upload

depends_on chain verification:

  • CronJob depends on bucket + IAM policy attachment + role binding
  • Bucket, IAM user, IAM policy all depend on helm_release.minio
  • Consistent with CNPG dependency pattern

VERDICT: APPROVED

## PR #39 Review ### BLOCKERS None. ### NITS 1. **Runtime `mc` download is an external dependency.** The CronJob downloads the MinIO client binary from `dl.min.io` on every execution. If that CDN is unreachable (network issues, air-gapped future, rate limits), the backup silently fails after the curl step. Consider building a small custom image with `kubectl` + `mc` pre-baked, or at minimum caching the binary to a PVC. Not blocking because the `set -euo pipefail` will cause the job to fail visibly (non-zero exit) and the `failedJobsHistoryLimit = 3` provides audit trail. 2. **MinIO credential key naming diverges from CNPG pattern.** The existing `cnpg_s3_creds` secret uses `ACCESS_KEY_ID` / `ACCESS_SECRET_KEY`, while this new `tf_backup_s3_creds` uses `AWS_ACCESS_KEY_ID` / `AWS_SECRET_ACCESS_KEY`. This is correct for the `mc` client (which reads `AWS_*` env vars), so it is intentional and functional -- but worth noting the inconsistency for future reference. The CNPG secret keys are consumed by the CNPG operator which expects its own naming. No action needed. 3. **`bitnami/kubectl:1.31` minor version pin.** Good practice, but will need updating when the cluster moves past 1.31.x. Consider a tracking issue or a comment in the code. 4. **No `force_destroy` on the MinIO bucket.** Consistent with the existing `minio_s3_bucket.assets` and `minio_s3_bucket.postgres_wal` patterns. Just noting that if this bucket ever needs to be removed via Terraform, it will require manual emptying first (same lesson learned from the Litestream cleanup in PR #31). ### SOP COMPLIANCE - [x] Branch named after issue (`36-tf-state-backup-cronjob` references #36) - [x] PR body follows template (Summary, Changes, Test Plan, Related all present) - [x] Related references plan slug (`plan-pal-e-platform` -- Phase 6.1) - [x] `Closes #36` in PR body - [x] No secrets committed (credentials flow through Terraform variables and Kubernetes secrets) - [x] No unnecessary file changes (single file: `terraform/main.tf`, 198 additions) - [x] Commit messages are descriptive - [x] Follows existing MinIO IAM/bucket patterns from CNPG section (lines 1157-1207) **RBAC verification:** - [x] Role scoped to `resource_names = ["tfstate-default-pal-e-platform", "tfstate-default-pal-e-services"]` -- NOT blanket secret access - [x] Verbs restricted to `["get"]` only - [x] CronJob uses dedicated ServiceAccount, not default SA - [x] RoleBinding correctly binds the Role to the ServiceAccount in the `tofu-state` namespace **Script logic verification:** - [x] Reads each state secret via `kubectl get secret` - [x] Extracts `.data.tfstate` and base64-decodes - [x] Uploads to MinIO with date-stamped filenames - [x] Prunes backups older than 30 days - [x] Uses `set -euo pipefail` for fail-fast behavior - [x] Cleans up temp files after upload **depends_on chain verification:** - [x] CronJob depends on bucket + IAM policy attachment + role binding - [x] Bucket, IAM user, IAM policy all depend on `helm_release.minio` - [x] Consistent with CNPG dependency pattern ### VERDICT: APPROVED
forgejo_admin deleted branch 36-tf-state-backup-cronjob 2026-03-14 14:05:55 +00:00
Sign in to join this conversation.
No description provided.