Bug: Alembic migration crash on deploy — forked chain from revision 018 #179
Labels
No labels
domain:backend
domain:devops
domain:frontend
status:approved
status:in-progress
status:needs-fix
status:qa
type:bug
type:devops
type:feature
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
forgejo_admin/basketball-api#179
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Type
Bug
Lineage
standalone — discovered during deploy of PRs #172 + #174. App is crash-looping in prod.
Repo
forgejo_admin/basketball-apiWhat Broke
basketball-api pod
basketball-api-75b6cb89b9-rdxlbis in CrashLoopBackOff. Alembic auto-upgrade on startup fails with:The prod database
alembic_versiontable contains018. The migration chain forks at018into two branches:018→019(player_teams_junction)018→020(custom_notes) →021(oauth_tokens)Both
019and020havedown_revision = "018". Merge migration022_merge_heads.pyhasdown_revision = ("019", "021")to rejoin them. But alembic can't resolve the fork because the DB only has a single head stamp at018— it doesn't know which branch to follow.Repro Steps
origin/maincodealembic upgrade headalembic_version = 018, tries to find next revision019and020both descend from018)Can't locate revision identified by '018'Expected Behavior
Alembic should upgrade from
018through019,020,021,022(merge),023(jersey backfill) cleanly.Environment
basketball-apibasketball-api-75b6cb89b9-rdxlb(CrashLoopBackOff)018basketball-api-5b64ff9cf5-2ncc6(terminated by ArgoCD sync)File Targets
alembic/versions/019_player_teams_junction.py—down_revision = "018"(branch 1)alembic/versions/020_add_custom_notes_to_player.py—down_revision = "018"(branch 2, creates fork)alembic/versions/022_merge_heads.py—down_revision = ("019", "021")(merge point)CMDor entrypoint) — wherealembic upgrade headrunsTest Expectations
Run:
cd ~/basketball-api && python -m pytest tests/ -v— all 555 tests pass (migration chain is a prod-only issue, tests use fresh DB)Validation after fix:
kubectl exec -n basketball-api <new-pod> -- alembic upgrade headsucceedsSELECT version_num FROM alembic_version;returns0231/1 Runningwithout CrashLoopBackOffAcceptance Criteria
018to023alembic_versiontable shows023after successful upgradeConstraints
018is the root issue. Fix options:020todown_revision = "019"(simplest, but rewrites a migration already applied elsewhere?)alembic stamp 019 021to tell alembic both branches are at their heads, thenalembic upgrade headruns022+023019,020,021have actually been applied to the prod DB (tables/columns may already exist from a previous deploy)Related
#170— jersey sync fix (includes migration 023, blocked by this)#173— teams/save fix (code change only, also blocked by this deploy)#171— Baby Betty data fix (blocked by deploy)Scope Review: NEEDS_REFINEMENT
Review note:
review-433-2026-03-26Ticket is nearly agent-ready but has two inaccuracies that would cause the agent to validate against wrong targets:
alembic_versionshould show023, but actual head on main is025(024_add_is_public_to_players + 025_add_coach_public_fields exist). Update all references from 023 to 025.e09c9e678004_add_division_column_to_players.py(down_revision="005") creates a second branch. Agent should delete it or ticket should explicitly scope it out. Otherwisealembic headswill still show multiple heads after the fork fix.Refinement (post review-433-2026-03-26)
Fix 1: Stale head reference
Migrations 024 and 025 were merged by the other session (insecure-registry pipeline fixes). Updated:
Fix 2: Orphan migration — in scope
e09c9e678004_add_division_column_to_players.pyhasdown_revision = "005"— a stale auto-generated migration creating a second detached branch. This must also be resolved oralembic headswill still show multiple heads after fixing the 018 fork.Added to scope:
alembic/versions/e09c9e678004_add_division_column_to_players.py— delete or integrate into chainalembic headsshows exactly 1 head after fixUpdated Acceptance Criteria
018to current headalembic headsshows exactly 1 head (no forks, no orphans)alembic_versiontable shows current head after successful upgradeScope Review (R2): NEEDS_REFINEMENT
Review note:
review-433-2026-03-26-r2Refinement Fix 1 (stale head reference 023→current head) is correct and resolved.
Refinement Fix 2 is WRONG and dangerous.
e09c9e678004_add_division_column_to_players.pyis NOT an orphan — migration007_add_email_log_table.pyhasdown_revision = "e09c9e678004". It is part of the main chain:005 → e09c9e678004 → 007 → 008 → ... → 018. Chain analysis confirms exactly 1 head (025) and only 1 fork (at 018). Deleting this migration would break the entire chain from 007 onward and destroy 70+ division-column references across the codebase.Actions needed:
e09c9e678004_add_division_column_to_players.pyfrom file targets and scopeRefinement v2 (post review-433-2026-03-26-r2)
Correction: e09c9e678004 is NOT an orphan
e09c9e678004_add_division_column_to_players.pyis part of the main chain (005 → e09c9e678004 → 007). Migration007depends on it. Removed from scope. Do NOT delete this file.Scope narrowed
The only issue is the fork at
018where both019and020havedown_revision = "018". The merge migration022exists but alembic can't traverse a fork from a single DB stamp.Updated File Targets (final)
alembic/versions/019_player_teams_junction.py—down_revision = "018"(branch 1)alembic/versions/020_add_custom_notes_to_player.py—down_revision = "018"(branch 2, the fork)alembic/versions/022_merge_heads.py—down_revision = ("019", "021")(merge point)Updated Acceptance Criteria (final)
018to current headalembic_versiontable shows current head after successful upgradeScope Review (R3): READY
Review note:
review-433-2026-03-26-r3v2 refinement is correct. All three file targets verified against codebase. Full migration chain independently traced — fork at 018 confirmed as the sole issue. e09c9e678004 correctly removed from scope (007 depends on it). ACs are machine-verifiable. No blast radius concerns. Ship it.