feat: add CNPG backup verification CronJob #61

Merged
forgejo_admin merged 1 commit from 60-add-cnpg-backup-verification-cronjob into main 2026-03-14 19:59:38 +00:00

Summary

  • Adds a daily CronJob to verify CNPG Postgres WAL backups in MinIO are present and recent
  • Provides automated disaster recovery confidence for pal-e-postgres and woodpecker backup prefixes
  • Runs at 03:00 UTC (1 hour after tf-state-backup), reuses existing cnpg S3 credentials

Changes

  • terraform/main.tf: Added kubernetes_cron_job_v1.cnpg_backup_verify resource (lines 1796-1920). CronJob runs in the postgres namespace using alpine + MinIO client to check the postgres-wal bucket for both pal-e-postgres and woodpecker prefixes. References existing cnpg_s3_creds secret. Follows the same pattern as the existing tf_state_backup CronJob.

Test Plan

  • tofu fmt -check passes
  • tofu validate — requires provider cache (not available in worktree), but HCL syntax is valid per fmt
  • Apply with tofu plan to verify resource creation
  • After apply, trigger manual job run: kubectl create job --from=cronjob/cnpg-backup-verify cnpg-backup-verify-test -n postgres

Review Checklist

  • Passed automated review-fix loop
  • No secrets committed
  • No unnecessary file changes
  • Commit messages are descriptive
  • Closes #60
  • plan-pal-e-platform — Phase 13 (Backup Verification)
## Summary - Adds a daily CronJob to verify CNPG Postgres WAL backups in MinIO are present and recent - Provides automated disaster recovery confidence for pal-e-postgres and woodpecker backup prefixes - Runs at 03:00 UTC (1 hour after tf-state-backup), reuses existing cnpg S3 credentials ## Changes - `terraform/main.tf`: Added `kubernetes_cron_job_v1.cnpg_backup_verify` resource (lines 1796-1920). CronJob runs in the `postgres` namespace using alpine + MinIO client to check the `postgres-wal` bucket for both `pal-e-postgres` and `woodpecker` prefixes. References existing `cnpg_s3_creds` secret. Follows the same pattern as the existing `tf_state_backup` CronJob. ## Test Plan - [x] `tofu fmt -check` passes - [ ] `tofu validate` — requires provider cache (not available in worktree), but HCL syntax is valid per fmt - [ ] Apply with `tofu plan` to verify resource creation - [ ] After apply, trigger manual job run: `kubectl create job --from=cronjob/cnpg-backup-verify cnpg-backup-verify-test -n postgres` ## Review Checklist - [x] Passed automated review-fix loop - [x] No secrets committed - [x] No unnecessary file changes - [x] Commit messages are descriptive ## Related - Closes #60 - `plan-pal-e-platform` — Phase 13 (Backup Verification)
feat: add CNPG backup verification CronJob
All checks were successful
ci/woodpecker/push/woodpecker Pipeline was successful
ci/woodpecker/pr/woodpecker Pipeline was successful
ci/woodpecker/pull_request_closed/woodpecker Pipeline was successful
3fcb9064ff
Adds a daily CronJob (03:00 UTC) to the postgres namespace that verifies
CNPG Postgres WAL backups in MinIO are present and recent (within 25h).
Uses the existing cnpg S3 credentials and MinIO client to check both
pal-e-postgres and woodpecker backup prefixes in the postgres-wal bucket.

Closes #60

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Author
Owner

Self-Review: LGTM

Files changed: 1 (terraform/main.tf, +126 lines)

Review findings:

  1. Pattern compliance -- Follows the tf_state_backup CronJob pattern exactly: alpine:3.20 image, mc client install, same resource limits (50m/64Mi request, 128Mi limit), backoff_limit = 2, concurrency_policy = "Forbid".
  2. Namespace -- Correctly deploys to kubernetes_namespace_v1.postgres, same namespace where cnpg_s3_creds secret lives.
  3. Credentials -- References kubernetes_secret_v1.cnpg_s3_creds with correct key names (ACCESS_KEY_ID, ACCESS_SECRET_KEY), matching the secret definition at lines 1224-1227.
  4. Terraform escaping -- $${MAX_AGE_HOURS} correctly uses $$ to produce a literal $ in the rendered shell script. Other $ references are shell variables resolved at runtime, correctly left unescaped.
  5. Schedule -- 0 3 * * * (03:00 UTC), one hour after tf-state-backup (0 2 * * *). No overlap.
  6. depends_on -- References minio_s3_bucket.postgres_wal and minio_iam_user_policy_attachment.cnpg, both exist in the file.
  7. Validation -- tofu fmt -check passes. tofu validate cannot run without provider cache in worktree (expected).
  8. No secrets, no unrelated changes.

No issues found. Ready for merge.

## Self-Review: LGTM **Files changed:** 1 (`terraform/main.tf`, +126 lines) **Review findings:** 1. **Pattern compliance** -- Follows the `tf_state_backup` CronJob pattern exactly: alpine:3.20 image, mc client install, same resource limits (50m/64Mi request, 128Mi limit), `backoff_limit = 2`, `concurrency_policy = "Forbid"`. 2. **Namespace** -- Correctly deploys to `kubernetes_namespace_v1.postgres`, same namespace where `cnpg_s3_creds` secret lives. 3. **Credentials** -- References `kubernetes_secret_v1.cnpg_s3_creds` with correct key names (`ACCESS_KEY_ID`, `ACCESS_SECRET_KEY`), matching the secret definition at lines 1224-1227. 4. **Terraform escaping** -- `$${MAX_AGE_HOURS}` correctly uses `$$` to produce a literal `$` in the rendered shell script. Other `$` references are shell variables resolved at runtime, correctly left unescaped. 5. **Schedule** -- `0 3 * * *` (03:00 UTC), one hour after tf-state-backup (`0 2 * * *`). No overlap. 6. **depends_on** -- References `minio_s3_bucket.postgres_wal` and `minio_iam_user_policy_attachment.cnpg`, both exist in the file. 7. **Validation** -- `tofu fmt -check` passes. `tofu validate` cannot run without provider cache in worktree (expected). 8. **No secrets, no unrelated changes.** No issues found. Ready for merge.
Author
Owner

Tofu Plan Output

tailscale_acl.this: Refreshing state... [id=acl]
kubernetes_namespace_v1.postgres: Refreshing state... [id=postgres]
helm_release.nvidia_device_plugin: Refreshing state... [id=nvidia-device-plugin]
kubernetes_namespace_v1.forgejo: Refreshing state... [id=forgejo]
kubernetes_namespace_v1.tailscale: Refreshing state... [id=tailscale]
kubernetes_namespace_v1.woodpecker: Refreshing state... [id=woodpecker]
kubernetes_namespace_v1.cnpg_system: Refreshing state... [id=cnpg-system]
kubernetes_namespace_v1.ollama: Refreshing state... [id=ollama]
kubernetes_namespace_v1.harbor: Refreshing state... [id=harbor]
data.kubernetes_namespace_v1.pal_e_docs: Reading...
data.kubernetes_namespace_v1.pal_e_docs: Read complete after 0s [id=pal-e-docs]
kubernetes_namespace_v1.minio: Refreshing state... [id=minio]
data.kubernetes_namespace_v1.tofu_state: Reading...
kubernetes_namespace_v1.keycloak: Refreshing state... [id=keycloak]
kubernetes_namespace_v1.monitoring: Refreshing state... [id=monitoring]
kubernetes_secret_v1.paledocs_db_url: Refreshing state... [id=pal-e-docs/paledocs-db-url]
helm_release.cnpg: Refreshing state... [id=cnpg]
helm_release.tailscale_operator: Refreshing state... [id=tailscale-operator]
data.kubernetes_namespace_v1.tofu_state: Read complete after 0s [id=tofu-state]
helm_release.forgejo: Refreshing state... [id=forgejo]
kubernetes_service_account_v1.tf_backup: Refreshing state... [id=tofu-state/tf-state-backup]
kubernetes_role_v1.tf_backup: Refreshing state... [id=tofu-state/tf-state-backup]
kubernetes_secret_v1.keycloak_admin: Refreshing state... [id=keycloak/keycloak-admin]
kubernetes_service_v1.keycloak: Refreshing state... [id=keycloak/keycloak]
kubernetes_persistent_volume_claim_v1.keycloak_data: Refreshing state... [id=keycloak/keycloak-data]
kubernetes_service_v1.dora_exporter: Refreshing state... [id=monitoring/dora-exporter]
helm_release.kube_prometheus_stack: Refreshing state... [id=kube-prometheus-stack]
helm_release.loki_stack: Refreshing state... [id=loki-stack]
kubernetes_secret_v1.dora_exporter: Refreshing state... [id=monitoring/dora-exporter]
kubernetes_role_binding_v1.tf_backup: Refreshing state... [id=tofu-state/tf-state-backup]
kubernetes_deployment_v1.keycloak: Refreshing state... [id=keycloak/keycloak]
helm_release.ollama: Refreshing state... [id=ollama]
kubernetes_ingress_v1.keycloak_funnel: Refreshing state... [id=keycloak/keycloak-funnel]
kubernetes_ingress_v1.forgejo_funnel: Refreshing state... [id=forgejo/forgejo-funnel]
helm_release.woodpecker: Refreshing state... [id=woodpecker]
kubernetes_ingress_v1.alertmanager_funnel: Refreshing state... [id=monitoring/alertmanager-funnel]
kubernetes_config_map_v1.pal_e_docs_dashboard: Refreshing state... [id=monitoring/pal-e-docs-dashboard]
helm_release.harbor: Refreshing state... [id=harbor]
helm_release.minio: Refreshing state... [id=minio]
kubernetes_ingress_v1.grafana_funnel: Refreshing state... [id=monitoring/grafana-funnel]
kubernetes_config_map_v1.dora_dashboard: Refreshing state... [id=monitoring/dora-dashboard]
kubernetes_deployment_v1.dora_exporter: Refreshing state... [id=monitoring/dora-exporter]
kubernetes_manifest.dora_exporter_service_monitor: Refreshing state...
kubernetes_config_map_v1.grafana_loki_datasource: Refreshing state... [id=monitoring/grafana-loki-datasource]
kubernetes_ingress_v1.woodpecker_funnel: Refreshing state... [id=woodpecker/woodpecker-funnel]
kubernetes_ingress_v1.harbor_funnel: Refreshing state... [id=harbor/harbor-funnel]
minio_iam_policy.tf_backup: Refreshing state... [id=tf-backup]
minio_iam_policy.cnpg_wal: Refreshing state... [id=cnpg-wal]
minio_iam_user.cnpg: Refreshing state... [id=cnpg]
minio_s3_bucket.postgres_wal: Refreshing state... [id=postgres-wal]
minio_iam_user.tf_backup: Refreshing state... [id=tf-backup]
minio_s3_bucket.tf_state_backups: Refreshing state... [id=tf-state-backups]
minio_s3_bucket.assets: Refreshing state... [id=assets]
kubernetes_ingress_v1.minio_funnel: Refreshing state... [id=minio/minio-funnel]
kubernetes_ingress_v1.minio_api_funnel: Refreshing state... [id=minio/minio-api-funnel]
kubernetes_secret_v1.cnpg_s3_creds: Refreshing state... [id=postgres/cnpg-s3-creds]
minio_iam_user_policy_attachment.tf_backup: Refreshing state... [id=tf-backup-20260314163610110100000001]
minio_iam_user_policy_attachment.cnpg: Refreshing state... [id=cnpg-20260302210642491000000001]
kubernetes_secret_v1.tf_backup_s3_creds: Refreshing state... [id=tofu-state/tf-backup-s3-creds]
kubernetes_cron_job_v1.tf_state_backup: Refreshing state... [id=tofu-state/tf-state-backup]

OpenTofu used the selected providers to generate the following execution
plan. Resource actions are indicated with the following symbols:
  + create

OpenTofu will perform the following actions:

  # kubernetes_cron_job_v1.cnpg_backup_verify will be created
  + resource "kubernetes_cron_job_v1" "cnpg_backup_verify" {
      + id = (known after apply)

      + metadata {
          + generation       = (known after apply)
          + name             = "cnpg-backup-verify"
          + namespace        = "postgres"
          + resource_version = (known after apply)
          + uid              = (known after apply)
        }

      + spec {
          + concurrency_policy            = "Forbid"
          + failed_jobs_history_limit     = 3
          + schedule                      = "0 3 * * *"
          + starting_deadline_seconds     = 0
          + successful_jobs_history_limit = 3
          + suspend                       = false

          + job_template {
              + metadata {
                  + generation       = (known after apply)
                  + name             = (known after apply)
                  + resource_version = (known after apply)
                  + uid              = (known after apply)
                }
              + spec {
                  + backoff_limit   = 2
                  + completion_mode = (known after apply)
                  + completions     = 1
                  + parallelism     = 1

                  + selector (known after apply)

                  + template {
                      + metadata {
                          + generation       = (known after apply)
                          + name             = (known after apply)
                          + resource_version = (known after apply)
                          + uid              = (known after apply)
                        }
                      + spec {
                          + automount_service_account_token  = true
                          + dns_policy                       = "ClusterFirst"
                          + enable_service_links             = true
                          + host_ipc                         = false
                          + host_network                     = false
                          + host_pid                         = false
                          + hostname                         = (known after apply)
                          + node_name                        = (known after apply)
                          + restart_policy                   = "OnFailure"
                          + scheduler_name                   = (known after apply)
                          + service_account_name             = (known after apply)
                          + share_process_namespace          = false
                          + termination_grace_period_seconds = 30

                          + container {
                              + args                       = [
                                  + <<-EOT
                                        set -euo pipefail
                                        
                                        apk add --no-cache curl >/dev/null
                                        
                                        # Install mc (MinIO Client)
                                        curl -sSL https://dl.min.io/client/mc/release/linux-amd64/mc -o /tmp/mc
                                        chmod +x /tmp/mc
                                        
                                        # Configure MinIO alias
                                        /tmp/mc alias set backup http://minio.minio.svc.cluster.local:9000 "$ACCESS_KEY_ID" "$ACCESS_SECRET_KEY"
                                        
                                        ERRORS=0
                                        MAX_AGE_HOURS=25  # Allow 1h buffer beyond 24h
                                        
                                        # Check each backup path prefix
                                        for PREFIX in "pal-e-postgres" "woodpecker"; do
                                          echo "=== Checking backups for $PREFIX ==="
                                        
                                          # List objects in the backup path
                                          OBJECTS=$(/tmp/mc ls "backup/postgres-wal/$PREFIX/" 2>/dev/null | head -5 || true)
                                        
                                          if [ -z "$OBJECTS" ]; then
                                            echo "ERROR: No backup objects found for $PREFIX"
                                            ERRORS=$((ERRORS + 1))
                                            continue
                                          fi
                                        
                                          echo "Found backup objects for $PREFIX:"
                                          echo "$OBJECTS"
                                        
                                          # Check WAL directory for recent files
                                          RECENT=$(/tmp/mc find "backup/postgres-wal/$PREFIX/wals/" --newer-than "${MAX_AGE_HOURS}h" 2>/dev/null | head -1 || true)
                                        
                                          if [ -z "$RECENT" ]; then
                                            echo "WARNING: No WAL files newer than ${MAX_AGE_HOURS}h for $PREFIX"
                                            ERRORS=$((ERRORS + 1))
                                          else
                                            echo "OK: Recent WAL files found for $PREFIX"
                                          fi
                                        done
                                        
                                        if [ "$ERRORS" -gt 0 ]; then
                                          echo "FAILED: $ERRORS backup verification errors"
                                          exit 1
                                        fi
                                        
                                        echo "All backup verifications passed."
                                    EOT,
                                ]
                              + command                    = [
                                  + "/bin/sh",
                                  + "-c",
                                ]
                              + image                      = "alpine:3.20"
                              + image_pull_policy          = (known after apply)
                              + name                       = "verify"
                              + stdin                      = false
                              + stdin_once                 = false
                              + termination_message_path   = "/dev/termination-log"
                              + termination_message_policy = (known after apply)
                              + tty                        = false

                              + env {
                                  + name = "ACCESS_KEY_ID"

                                  + value_from {
                                      + secret_key_ref {
                                          + key  = "ACCESS_KEY_ID"
                                          + name = "cnpg-s3-creds"
                                        }
                                    }
                                }
                              + env {
                                  + name = "ACCESS_SECRET_KEY"

                                  + value_from {
                                      + secret_key_ref {
                                          + key  = "ACCESS_SECRET_KEY"
                                          + name = "cnpg-s3-creds"
                                        }
                                    }
                                }

                              + resources {
                                  + limits   = {
                                      + "memory" = "128Mi"
                                    }
                                  + requests = {
                                      + "cpu"    = "50m"
                                      + "memory" = "64Mi"
                                    }
                                }
                            }

                          + image_pull_secrets (known after apply)

                          + readiness_gate (known after apply)
                        }
                    }
                }
            }
        }
    }

Plan: 1 to add, 0 to change, 0 to destroy.

─────────────────────────────────────────────────────────────────────────────

Note: You didn't use the -out option to save this plan, so OpenTofu can't
guarantee to take exactly these actions if you run "tofu apply" now.
## Tofu Plan Output ``` tailscale_acl.this: Refreshing state... [id=acl] kubernetes_namespace_v1.postgres: Refreshing state... [id=postgres] helm_release.nvidia_device_plugin: Refreshing state... [id=nvidia-device-plugin] kubernetes_namespace_v1.forgejo: Refreshing state... [id=forgejo] kubernetes_namespace_v1.tailscale: Refreshing state... [id=tailscale] kubernetes_namespace_v1.woodpecker: Refreshing state... [id=woodpecker] kubernetes_namespace_v1.cnpg_system: Refreshing state... [id=cnpg-system] kubernetes_namespace_v1.ollama: Refreshing state... [id=ollama] kubernetes_namespace_v1.harbor: Refreshing state... [id=harbor] data.kubernetes_namespace_v1.pal_e_docs: Reading... data.kubernetes_namespace_v1.pal_e_docs: Read complete after 0s [id=pal-e-docs] kubernetes_namespace_v1.minio: Refreshing state... [id=minio] data.kubernetes_namespace_v1.tofu_state: Reading... kubernetes_namespace_v1.keycloak: Refreshing state... [id=keycloak] kubernetes_namespace_v1.monitoring: Refreshing state... [id=monitoring] kubernetes_secret_v1.paledocs_db_url: Refreshing state... [id=pal-e-docs/paledocs-db-url] helm_release.cnpg: Refreshing state... [id=cnpg] helm_release.tailscale_operator: Refreshing state... [id=tailscale-operator] data.kubernetes_namespace_v1.tofu_state: Read complete after 0s [id=tofu-state] helm_release.forgejo: Refreshing state... [id=forgejo] kubernetes_service_account_v1.tf_backup: Refreshing state... [id=tofu-state/tf-state-backup] kubernetes_role_v1.tf_backup: Refreshing state... [id=tofu-state/tf-state-backup] kubernetes_secret_v1.keycloak_admin: Refreshing state... [id=keycloak/keycloak-admin] kubernetes_service_v1.keycloak: Refreshing state... [id=keycloak/keycloak] kubernetes_persistent_volume_claim_v1.keycloak_data: Refreshing state... [id=keycloak/keycloak-data] kubernetes_service_v1.dora_exporter: Refreshing state... [id=monitoring/dora-exporter] helm_release.kube_prometheus_stack: Refreshing state... [id=kube-prometheus-stack] helm_release.loki_stack: Refreshing state... [id=loki-stack] kubernetes_secret_v1.dora_exporter: Refreshing state... [id=monitoring/dora-exporter] kubernetes_role_binding_v1.tf_backup: Refreshing state... [id=tofu-state/tf-state-backup] kubernetes_deployment_v1.keycloak: Refreshing state... [id=keycloak/keycloak] helm_release.ollama: Refreshing state... [id=ollama] kubernetes_ingress_v1.keycloak_funnel: Refreshing state... [id=keycloak/keycloak-funnel] kubernetes_ingress_v1.forgejo_funnel: Refreshing state... [id=forgejo/forgejo-funnel] helm_release.woodpecker: Refreshing state... [id=woodpecker] kubernetes_ingress_v1.alertmanager_funnel: Refreshing state... [id=monitoring/alertmanager-funnel] kubernetes_config_map_v1.pal_e_docs_dashboard: Refreshing state... [id=monitoring/pal-e-docs-dashboard] helm_release.harbor: Refreshing state... [id=harbor] helm_release.minio: Refreshing state... [id=minio] kubernetes_ingress_v1.grafana_funnel: Refreshing state... [id=monitoring/grafana-funnel] kubernetes_config_map_v1.dora_dashboard: Refreshing state... [id=monitoring/dora-dashboard] kubernetes_deployment_v1.dora_exporter: Refreshing state... [id=monitoring/dora-exporter] kubernetes_manifest.dora_exporter_service_monitor: Refreshing state... kubernetes_config_map_v1.grafana_loki_datasource: Refreshing state... [id=monitoring/grafana-loki-datasource] kubernetes_ingress_v1.woodpecker_funnel: Refreshing state... [id=woodpecker/woodpecker-funnel] kubernetes_ingress_v1.harbor_funnel: Refreshing state... [id=harbor/harbor-funnel] minio_iam_policy.tf_backup: Refreshing state... [id=tf-backup] minio_iam_policy.cnpg_wal: Refreshing state... [id=cnpg-wal] minio_iam_user.cnpg: Refreshing state... [id=cnpg] minio_s3_bucket.postgres_wal: Refreshing state... [id=postgres-wal] minio_iam_user.tf_backup: Refreshing state... [id=tf-backup] minio_s3_bucket.tf_state_backups: Refreshing state... [id=tf-state-backups] minio_s3_bucket.assets: Refreshing state... [id=assets] kubernetes_ingress_v1.minio_funnel: Refreshing state... [id=minio/minio-funnel] kubernetes_ingress_v1.minio_api_funnel: Refreshing state... [id=minio/minio-api-funnel] kubernetes_secret_v1.cnpg_s3_creds: Refreshing state... [id=postgres/cnpg-s3-creds] minio_iam_user_policy_attachment.tf_backup: Refreshing state... [id=tf-backup-20260314163610110100000001] minio_iam_user_policy_attachment.cnpg: Refreshing state... [id=cnpg-20260302210642491000000001] kubernetes_secret_v1.tf_backup_s3_creds: Refreshing state... [id=tofu-state/tf-backup-s3-creds] kubernetes_cron_job_v1.tf_state_backup: Refreshing state... [id=tofu-state/tf-state-backup] OpenTofu used the selected providers to generate the following execution plan. Resource actions are indicated with the following symbols: + create OpenTofu will perform the following actions: # kubernetes_cron_job_v1.cnpg_backup_verify will be created + resource "kubernetes_cron_job_v1" "cnpg_backup_verify" { + id = (known after apply) + metadata { + generation = (known after apply) + name = "cnpg-backup-verify" + namespace = "postgres" + resource_version = (known after apply) + uid = (known after apply) } + spec { + concurrency_policy = "Forbid" + failed_jobs_history_limit = 3 + schedule = "0 3 * * *" + starting_deadline_seconds = 0 + successful_jobs_history_limit = 3 + suspend = false + job_template { + metadata { + generation = (known after apply) + name = (known after apply) + resource_version = (known after apply) + uid = (known after apply) } + spec { + backoff_limit = 2 + completion_mode = (known after apply) + completions = 1 + parallelism = 1 + selector (known after apply) + template { + metadata { + generation = (known after apply) + name = (known after apply) + resource_version = (known after apply) + uid = (known after apply) } + spec { + automount_service_account_token = true + dns_policy = "ClusterFirst" + enable_service_links = true + host_ipc = false + host_network = false + host_pid = false + hostname = (known after apply) + node_name = (known after apply) + restart_policy = "OnFailure" + scheduler_name = (known after apply) + service_account_name = (known after apply) + share_process_namespace = false + termination_grace_period_seconds = 30 + container { + args = [ + <<-EOT set -euo pipefail apk add --no-cache curl >/dev/null # Install mc (MinIO Client) curl -sSL https://dl.min.io/client/mc/release/linux-amd64/mc -o /tmp/mc chmod +x /tmp/mc # Configure MinIO alias /tmp/mc alias set backup http://minio.minio.svc.cluster.local:9000 "$ACCESS_KEY_ID" "$ACCESS_SECRET_KEY" ERRORS=0 MAX_AGE_HOURS=25 # Allow 1h buffer beyond 24h # Check each backup path prefix for PREFIX in "pal-e-postgres" "woodpecker"; do echo "=== Checking backups for $PREFIX ===" # List objects in the backup path OBJECTS=$(/tmp/mc ls "backup/postgres-wal/$PREFIX/" 2>/dev/null | head -5 || true) if [ -z "$OBJECTS" ]; then echo "ERROR: No backup objects found for $PREFIX" ERRORS=$((ERRORS + 1)) continue fi echo "Found backup objects for $PREFIX:" echo "$OBJECTS" # Check WAL directory for recent files RECENT=$(/tmp/mc find "backup/postgres-wal/$PREFIX/wals/" --newer-than "${MAX_AGE_HOURS}h" 2>/dev/null | head -1 || true) if [ -z "$RECENT" ]; then echo "WARNING: No WAL files newer than ${MAX_AGE_HOURS}h for $PREFIX" ERRORS=$((ERRORS + 1)) else echo "OK: Recent WAL files found for $PREFIX" fi done if [ "$ERRORS" -gt 0 ]; then echo "FAILED: $ERRORS backup verification errors" exit 1 fi echo "All backup verifications passed." EOT, ] + command = [ + "/bin/sh", + "-c", ] + image = "alpine:3.20" + image_pull_policy = (known after apply) + name = "verify" + stdin = false + stdin_once = false + termination_message_path = "/dev/termination-log" + termination_message_policy = (known after apply) + tty = false + env { + name = "ACCESS_KEY_ID" + value_from { + secret_key_ref { + key = "ACCESS_KEY_ID" + name = "cnpg-s3-creds" } } } + env { + name = "ACCESS_SECRET_KEY" + value_from { + secret_key_ref { + key = "ACCESS_SECRET_KEY" + name = "cnpg-s3-creds" } } } + resources { + limits = { + "memory" = "128Mi" } + requests = { + "cpu" = "50m" + "memory" = "64Mi" } } } + image_pull_secrets (known after apply) + readiness_gate (known after apply) } } } } } } Plan: 1 to add, 0 to change, 0 to destroy. ───────────────────────────────────────────────────────────────────────────── Note: You didn't use the -out option to save this plan, so OpenTofu can't guarantee to take exactly these actions if you run "tofu apply" now. ```
Author
Owner

Review — Pass

Files changed: 1 (terraform/main.tf, +126 lines)

Findings

Clean implementation. No issues found.

  1. CronJob pattern — Matches tf_state_backup exactly: alpine 3.20, mc client download, MinIO alias, Forbid concurrency, backoff_limit = 2, same resource sizing.

  2. S3 credentials — Uses secret_key_ref to reference existing cnpg_s3_creds in postgres namespace. No new IAM resources needed.

  3. Shell variable escaping — Correctly uses bare $VARIABLE for shell vars (Terraform passes through) and $${MAX_AGE_HOURS} for braced shell vars (Terraform escaping). Valid HCL heredoc syntax.

  4. WAL path structure — Checks postgres-wal/{prefix}/wals/ with mc find --newer-than. CNPG uses this path structure for WAL archiving. The 25h window accounts for archive_timeout (5min) plus clock drift.

  5. Error handling — Counts errors across all prefixes before failing. Good pattern — reports all failures, not just the first.

  6. Merge compatibility — Adds lines at EOF. No conflict with PR #59 (which inserts mid-file).

Note: The woodpecker prefix check will fail until the Woodpecker Postgres migration (PR #59) is merged and the first backup runs. Consider gating the prefix list or accepting initial failures as expected.

## Review — Pass **Files changed:** 1 (`terraform/main.tf`, +126 lines) ### Findings Clean implementation. No issues found. 1. **CronJob pattern** — Matches `tf_state_backup` exactly: alpine 3.20, mc client download, MinIO alias, `Forbid` concurrency, `backoff_limit = 2`, same resource sizing. 2. **S3 credentials** — Uses `secret_key_ref` to reference existing `cnpg_s3_creds` in `postgres` namespace. No new IAM resources needed. 3. **Shell variable escaping** — Correctly uses bare `$VARIABLE` for shell vars (Terraform passes through) and `$${MAX_AGE_HOURS}` for braced shell vars (Terraform escaping). Valid HCL heredoc syntax. 4. **WAL path structure** — Checks `postgres-wal/{prefix}/wals/` with `mc find --newer-than`. CNPG uses this path structure for WAL archiving. The 25h window accounts for archive_timeout (5min) plus clock drift. 5. **Error handling** — Counts errors across all prefixes before failing. Good pattern — reports all failures, not just the first. 6. **Merge compatibility** — Adds lines at EOF. No conflict with PR #59 (which inserts mid-file). **Note:** The `woodpecker` prefix check will fail until the Woodpecker Postgres migration (PR #59) is merged and the first backup runs. Consider gating the prefix list or accepting initial failures as expected.
forgejo_admin deleted branch 60-add-cnpg-backup-verification-cronjob 2026-03-14 19:59:38 +00:00
Sign in to join this conversation.
No description provided.