SOP harden: sop-postgres-restore Step 5 (real-DR swap) needs PDB + ArgoCD lock + service-collision runbook #300
Labels
No labels
domain:backend
domain:devops
domain:frontend
status:approved
status:in-progress
status:needs-fix
status:qa
type:bug
type:devops
type:feature
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
ldraney/pal-e-platform#300
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Type
SOP hardening (discovered scope)
Lineage
Surfaced during
pal-e-platform#298restore drill (2026-04-21). Drill was PASS, but gap #7 invalidation-postgres-restore-2026-04-21is out of scope for the drill itself: Step 5 "Swap (if replacing production)" insop-postgres-restoredescribes the happy-path sequence but doesn't address three real-world footguns.Repo
forgejo_admin/pal-e-platformUser Story
As the on-call engineer running a real P0 Postgres recovery, I need
sop-postgres-restoreStep 5 (Swap) to cover the three operational footguns that block a clean cutover — so that a cluster-restore incident doesn't trigger a second incident on top of the first.Context
Step 5 currently says: scale app to 0, delete old cluster, rename or repoint, scale app back up. Real-world swap requires more:
replicas=0scale-down. Need to either--ignore-pdbor delete the PDB first.pal-e-postgresis ArgoCD-managed, ArgoCD will re-sync the deleted cluster immediately. Need to suspend sync, or annotate the Application withargocd.argoproj.io/sync-options: Prune=falsebefore delete.-rw,-ro,-rservices per cluster. The new cluster's services will collide with the old unless renamed first, OR the app's DATABASE_URL must atomically cut over (no graceful handoff window).Without these steps in the runbook, a P0 swap has a real chance of a second incident on top of the first.
File Targets
pal-e-docsnotesop-postgres-restore(slug) — Step 5b section expansionTest Expectations
Constraints
pal-e-postgresduring authoringAcceptance Criteria
sop-postgres-restoreStep 5b expanded into three subsections (5b.1 Quiesce, 5b.2 Cut over, 5b.3 Resume)validation-postgres-restore-2026-04-21gap #7 updated to point at the resolved SOP revisionChecklist
Related
pal-e-platform#298— parent drill ticket (PASS verdict)validation-postgres-restore-2026-04-21— drill results, gap #7sop-postgres-restore— SOP being hardenedfeedback_validate_before_done.md— never trust untested runbooks