Critical: Migrate basketball-api Postgres to CNPG shared cluster #187
Labels
No labels
domain:backend
domain:devops
domain:frontend
status:approved
status:in-progress
status:needs-fix
status:qa
type:bug
type:devops
type:feature
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
forgejo_admin/pal-e-platform#187
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Type
Bug
Lineage
standalone — discovered during #184 CI investigation. Data safety audit revealed basketball-api Postgres has zero backup coverage.
Repo
forgejo_admin/pal-e-platform(CNPG cluster config) +forgejo_admin/pal-e-deployments(basketball-api kustomize) +forgejo_admin/basketball-api(connection string)User Story
story:WS-S5As superadmin, I want basketball-api's database on the CNPG shared cluster so that player data has daily backups, WAL archiving, and automated restore verification — because right now 76 players, 6 signed contracts, 68 registrations, and 16 jersey orders sit on a single local-path PVC with zero backup coverage.What Broke
Nothing is broken yet — this is a ticking time bomb. The basketball-api Postgres is a standalone Deployment pod with:
local-pathPVC on archbox (single node, single disk)Delete— if PVC is deleted, data is goneMeanwhile, the CNPG cluster (
pal-e-postgres-1) has daily Barman → MinIO backups, WAL archiving, gzip compression, 7-day retention, and a dailycnpg-backup-verifyCronJob that proves restores work.Architecture
Repro Steps
kubectl get pv | grep basketball→reclaimPolicy: Deletekubectl get backups -n basketball-api→ no resources foundkubectl get scheduledbackup -n basketball-api→ no resources foundkubectl get backups -n postgres→ 9 daily backups, all completedExpected Behavior
Basketball-api database should have the same backup coverage as pal-e-docs: daily Barman snapshots, WAL archiving, 7-day retention, automated restore verification.
Environment
basketball-apinamespacepostgresnamespace, Barman → MinIOFile Targets
terraform/main.tfor CNPG cluster manifest — addbasketballdatabase + user to shared cluster~/pal-e-deployments/basketball-api/— update kustomize overlay: new DATABASE_HOST, remove standalone postgres Deployment~/basketball-api/src/basketball_api/config.py— verify connection string works with new host~/basketball-api/k8s/deployment.yaml— update DATABASE_HOST env varMigration Steps
basketballdatabase + user on CNPG clusterpg_dumpfrom standalone pod →pg_restoreinto CNPG (9MB, seconds)pal-e-postgres-rw.postgres.svc.cluster.local)cnpg-backup-verifyCronJob covers the new databaseTest Expectations
curl basketball-api.tail5b443a.ts.net/healthreturns 200SELECT count(*) FROM playersreturns 76 (or current count)SELECT count(*) FROM orders WHERE status = 'paid'matches pre-migrationkubectl get backups -n postgresshows next daily backup includes basketball datacnpg-backup-verifyCronJob completes successfullyAcceptance Criteria
basketballdatabase exists on CNPG shared clustercnpg-backup-verifyCronJob validates basketball data restores correctlyConstraints
Related
forgejo_admin/pal-e-platform#184— CI blocker (triggered this discovery)project-westside-basketball— project this affectsterraform/main.tf(Barman → MinIO, 7-day retention)sop-postgres-restore— existing restore SOP for CNPGScope Review: NEEDS_REFINEMENT
Review note:
review-417-2026-03-26Well-scoped ticket with complete traceability and thorough technical detail, but four issues need resolution before it's ready for execution:
pal-e-postgresshared cluster is not defined in terraform (either pal-e-platform or pal-e-services). Ticket must clarify where the cluster manifest lives and howbasketballdatabase + user get created on an existing cluster.terraform/network-policies.tfpostgres namespace policy does not allowbasketball-apiingress. This is mentioned in Constraints but absent from File Targets and Acceptance Criteria.~/pal-e-deployments/basketball-api/should beoverlays/basketball-api/prod/. The~/basketball-api/k8s/deployment.yamlis not the ArgoCD deploy target (kustomize overlay is).cnpg-backup-verifyCronJob checks WAL freshness by prefix, not per-database restore validation. Reword to match actual behavior.Blast radius:
mcd-trackerandpal-e-mailhave the identical standalone postgres:16-alpine pattern with zero backup coverage. Discovered scope -- separate tickets needed.Refinement (post review-417-2026-03-26)
Addressing 4 review findings + 1 critical discovery.
Critical discovery: CNPG cluster manifest is orphaned
The
pal-e-postgresCNPG cluster was removed fromterraform/main.tfon branch16-remove-app-level-cnpg-resources-from-pla(commitc50f013) as part of architectural separation — it was supposed to move topal-e-servicesbut never landed there. The cluster is running in prod but has no manifest in any repo.This means:
basketballdatabase requires either: (a) re-creating the cluster manifest in pal-e-services, or (b) applying a rawkubectlcommandPre-requisite: Re-establish the CNPG cluster manifest in pal-e-services before this ticket can proceed.
Fixes from review
terraform/network-policies.tf:175-179must addbasketball-apinamespace to the postgres ingress allow list. Adding to File Targets + AC.~/pal-e-deployments/overlays/basketball-api/prod/(not~/basketball-api/k8s/)terraform/network-policies.tf(new — network policy update)Discovered scope (filed)
Refinement v2 (post review-417-2026-03-26 + CNPG manifest landed)
Prereq resolved
pal-e-services#33is merged. CNPG cluster manifest is back under source control. Addingbasketballdatabase is now a SQL operation on the running cluster (per Constraints section —bootstrap.initdbonly runs on creation).Fix 1: CNPG database creation mechanism
Adding a database to an existing CNPG cluster is SQL, not manifest:
Run via:
kubectl exec -n postgres pal-e-postgres-1 -- psql -U postgres -c "..."Fix 2: Network policy — added to File Targets + AC
File:
terraform/network-policies.tf(lines 175-179). Must addbasketball-apinamespace to the postgres ingress allow list.Fix 3: File target paths corrected
→~/pal-e-deployments/basketball-api/~/pal-e-deployments/overlays/basketball-api/prod/→ not the deploy target. ArgoCD image updater handles image tags. Kustomize overlay has env vars.~/basketball-api/k8s/deployment.yaml~/basketball-api/src/basketball_api/config.py— verify DATABASE_URL format works with new hostFix 4: Backup verification AC reworded
cnpg-backup-verifyCronJob checks WAL object freshness in MinIO by prefix, not per-database restore. AC reworded to: "Next daily Barman backup completes successfully after migration."Updated File Targets (final)
terraform/network-policies.tf— addbasketball-apito postgres namespace ingress allow list~/pal-e-deployments/overlays/basketball-api/prod/kustomization.yaml— updateDATABASE_HOSTenv var topal-e-postgres-rw.postgres.svc.cluster.local~/pal-e-deployments/overlays/basketball-api/prod/deployment-patch.yaml— update DATABASE env vars~/pal-e-deployments/overlays/basketball-api/prod/postgres.yaml— REMOVE (standalone postgres Deployment)~/pal-e-deployments/overlays/basketball-api/prod/pvc.yaml— REMOVE after verification (standalone PVC)Updated Acceptance Criteria (final)
basketballdatabase + user created on CNPG cluster via SQLbasketball-apinamespace →postgresnamespacepal-e-postgres-rw.postgres.svc.cluster.localpg_dumpfrom standalone →pg_restoreinto CNPG verified (row counts match)curl basketball-api.tail5b443a.ts.net/healthreturns 200Migration Steps (updated)
players,orders,registrations,teams,parentsScope Review: NEEDS_REFINEMENT
Review note:
review-417-2026-03-26-v2Re-review after refinement v2. All 4 original issues were addressed. One new file target error found:
pvc.yaml) is WRONG —pvc.yamlcontains thephoto-uploadsPVC, NOT the postgres PVC. The postgres PVC (postgres-data) is defined insidepostgres.yaml. Removingpvc.yamlas the ticket instructs would delete photo uploads and cause data loss. Fix: remove file target #5 entirely —postgres.yamlremoval (target #4) already handles the standalone postgres PVC.BASKETBALL_DATABASE_URL(full connection string), not a separateDATABASE_HOSTvar as refinement v2 states. Update wording to prevent agent confusion.Refinement v3 (post review-417-2026-03-26-v2)
Fix 1: CRITICAL — pvc.yaml is photo-uploads, NOT postgres
pvc.yamlcontains thephoto-uploadsPVC for player photos. DO NOT DELETE. The postgres PVC (postgres-data) is defined insidepostgres.yaml. Removingpostgres.yamlhandles both the standalone Deployment and its PVC reference.Corrected File Targets:
terraform/network-policies.tf— addbasketball-apito postgres namespace ingress allow list~/pal-e-deployments/overlays/basketball-api/prod/deployment-patch.yaml— updateBASKETBALL_DATABASE_URLconnection string to point at CNPG~/pal-e-deployments/overlays/basketball-api/prod/postgres.yaml— REMOVE (standalone postgres Deployment + PVC)~/pal-e-deployments/overlays/basketball-api/prod/kustomization.yaml— removepostgres.yamlfrom resources list— DO NOT TOUCH (contains photo-uploads PVC, not postgres)pvc.yamlFix 2: Env var name correction
The actual env var is
BASKETBALL_DATABASE_URL(full connection string), not separateDATABASE_HOST/DATABASE_NAMEvars. The connection string indeployment-patch.yamlline 29-30 needs updating:Scope Review: READY
Review note:
review-417-2026-03-26-v3All 5 findings from previous reviews (v1: 4 issues, v2: 1 blocker + 1 advisory) verified as resolved in refinement v3. All file targets confirmed against codebase. Prereq (pal-e-services#33 CNPG manifest) is done. Traceability complete. Ticket is ready for execution.
Agent note: read refinement v3 comment for corrected file targets and acceptance criteria -- issue body still has original (stale) versions.