Add TF state backup CronJob to MinIO #36

Closed
opened 2026-03-14 13:58:56 +00:00 by forgejo_admin · 0 comments

Lineage

plan-pal-e-platform → Phase 6 (CI Pipeline Hardening) → Phase 6.1 (State Backup CronJob)

Repo

forgejo_admin/pal-e-platform

User Story

As the platform operator
I want Terraform state secrets automatically backed up daily to MinIO
So that I can recover from state corruption or loss in minutes instead of hours (MTTR)

Context

Both Terraform repos (pal-e-platform, pal-e-services) use a Kubernetes backend for state, stored as secrets in the tofu-state namespace. There are currently no backups of these secrets. If state is corrupted during a bad apply, recovery means manual state reconstruction.

This is the safety net that makes CI-driven tofu apply (Phase 6.4) safe to deploy. Backup first, automate second.

Key technical facts:

  • State backend: Kubernetes secrets in tofu-state namespace
  • Secret names: tfstate-default-pal-e-platform and tfstate-default-pal-e-services
  • State is stored in the tfstate key, base64-encoded
  • MinIO is deployed in the minio namespace, S3 API at http://minio.minio.svc.cluster.local:9000
  • Existing MinIO IAM pattern: see minio_iam_user.cnpg and minio_iam_policy.cnpg_wal in terraform/main.tf
  • Existing MinIO bucket pattern: see minio_s3_bucket.postgres_wal in terraform/main.tf

File Targets

Files to modify:

  • terraform/main.tf — add: MinIO bucket (tf-state-backups), MinIO IAM user + policy, k8s Secret for MinIO creds in tofu-state namespace, CronJob resource, ServiceAccount + RBAC for reading state secrets

Files NOT to touch:

  • terraform/variables.tf — no new variables needed (MinIO creds come from IAM user resource outputs)
  • salt/ — host-level config, not relevant
  • .woodpecker.yaml — separate issue

Acceptance Criteria

  • MinIO bucket tf-state-backups created via minio_s3_bucket resource
  • MinIO IAM user tf-backup with policy allowing GetObject, PutObject, DeleteObject, ListBucket on tf-state-backups
  • Kubernetes Secret in tofu-state namespace with MinIO access credentials
  • ServiceAccount in tofu-state namespace with RBAC to read secrets (tfstate-default-pal-e-platform, tfstate-default-pal-e-services)
  • CronJob in tofu-state namespace that:
    • Runs daily at 02:00 UTC (0 2 * * *)
    • For each state secret: reads it, base64-decodes the tfstate key, uploads to s3://tf-state-backups/{secret-name}-{date}.json
    • Deletes backups older than 30 days from the bucket
    • Uses a lightweight image (e.g., bitnami/minio-client or alpine with curl)
  • CronJob uses the ServiceAccount with RBAC (not the default SA)
  • tofu validate passes
  • tofu fmt applied

Test Expectations

  • tofu plan shows expected new resources (bucket, IAM, Secret, ServiceAccount, Role, RoleBinding, CronJob)
  • Post-apply: manually trigger CronJob with kubectl create job --from=cronjob/<name> test-backup -n tofu-state
  • Post-apply: verify backup files exist in MinIO bucket via mc ls

Constraints

  • Follow existing Terraform patterns in main.tf for MinIO resources (see minio_s3_bucket.postgres_wal, minio_iam_user.cnpg, minio_iam_policy.cnpg_wal)
  • CronJob should use kubernetes_cron_job_v1 resource (not manifest)
  • RBAC should be scoped to only the two state secrets — not blanket secret read access
  • All resources should have clear depends_on chains
  • Group all state backup resources together with a clear comment header (e.g., # --- TF State Backup ---)

Checklist

  • PR opened with Closes #36 in body
  • tofu plan output included in PR description
  • tofu fmt and tofu validate pass
  • No unrelated changes
  • project-pal-e-platform — project
  • phase-pal-e-platform-ci-6-1-state-backup — phase note in pal-e-docs
### Lineage `plan-pal-e-platform` → Phase 6 (CI Pipeline Hardening) → Phase 6.1 (State Backup CronJob) ### Repo `forgejo_admin/pal-e-platform` ### User Story As the platform operator I want Terraform state secrets automatically backed up daily to MinIO So that I can recover from state corruption or loss in minutes instead of hours (MTTR) ### Context Both Terraform repos (pal-e-platform, pal-e-services) use a Kubernetes backend for state, stored as secrets in the `tofu-state` namespace. There are currently **no backups** of these secrets. If state is corrupted during a bad apply, recovery means manual state reconstruction. This is the safety net that makes CI-driven `tofu apply` (Phase 6.4) safe to deploy. Backup first, automate second. Key technical facts: - State backend: Kubernetes secrets in `tofu-state` namespace - Secret names: `tfstate-default-pal-e-platform` and `tfstate-default-pal-e-services` - State is stored in the `tfstate` key, base64-encoded - MinIO is deployed in the `minio` namespace, S3 API at `http://minio.minio.svc.cluster.local:9000` - Existing MinIO IAM pattern: see `minio_iam_user.cnpg` and `minio_iam_policy.cnpg_wal` in `terraform/main.tf` - Existing MinIO bucket pattern: see `minio_s3_bucket.postgres_wal` in `terraform/main.tf` ### File Targets Files to modify: - `terraform/main.tf` — add: MinIO bucket (`tf-state-backups`), MinIO IAM user + policy, k8s Secret for MinIO creds in `tofu-state` namespace, CronJob resource, ServiceAccount + RBAC for reading state secrets Files NOT to touch: - `terraform/variables.tf` — no new variables needed (MinIO creds come from IAM user resource outputs) - `salt/` — host-level config, not relevant - `.woodpecker.yaml` — separate issue ### Acceptance Criteria - [ ] MinIO bucket `tf-state-backups` created via `minio_s3_bucket` resource - [ ] MinIO IAM user `tf-backup` with policy allowing `GetObject`, `PutObject`, `DeleteObject`, `ListBucket` on `tf-state-backups` - [ ] Kubernetes Secret in `tofu-state` namespace with MinIO access credentials - [ ] ServiceAccount in `tofu-state` namespace with RBAC to read secrets (`tfstate-default-pal-e-platform`, `tfstate-default-pal-e-services`) - [ ] CronJob in `tofu-state` namespace that: - Runs daily at 02:00 UTC (`0 2 * * *`) - For each state secret: reads it, base64-decodes the `tfstate` key, uploads to `s3://tf-state-backups/{secret-name}-{date}.json` - Deletes backups older than 30 days from the bucket - Uses a lightweight image (e.g., `bitnami/minio-client` or `alpine` with `curl`) - [ ] CronJob uses the ServiceAccount with RBAC (not the default SA) - [ ] `tofu validate` passes - [ ] `tofu fmt` applied ### Test Expectations - [ ] `tofu plan` shows expected new resources (bucket, IAM, Secret, ServiceAccount, Role, RoleBinding, CronJob) - [ ] Post-apply: manually trigger CronJob with `kubectl create job --from=cronjob/<name> test-backup -n tofu-state` - [ ] Post-apply: verify backup files exist in MinIO bucket via `mc ls` ### Constraints - Follow existing Terraform patterns in `main.tf` for MinIO resources (see `minio_s3_bucket.postgres_wal`, `minio_iam_user.cnpg`, `minio_iam_policy.cnpg_wal`) - CronJob should use `kubernetes_cron_job_v1` resource (not manifest) - RBAC should be scoped to only the two state secrets — not blanket secret read access - All resources should have clear `depends_on` chains - Group all state backup resources together with a clear comment header (e.g., `# --- TF State Backup ---`) ### Checklist - [ ] PR opened with `Closes #36` in body - [ ] `tofu plan` output included in PR description - [ ] `tofu fmt` and `tofu validate` pass - [ ] No unrelated changes ### Related - `project-pal-e-platform` — project - `phase-pal-e-platform-ci-6-1-state-backup` — phase note in pal-e-docs
forgejo_admin 2026-03-14 14:05:29 +00:00
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
forgejo_admin/pal-e-platform#36
No description provided.