Bug: Alembic migration crash on deploy — forked chain from revision 018 #179

Closed
opened 2026-03-27 03:42:31 +00:00 by forgejo_admin · 5 comments

Type

Bug

Lineage

standalone — discovered during deploy of PRs #172 + #174. App is crash-looping in prod.

Repo

forgejo_admin/basketball-api

What Broke

basketball-api pod basketball-api-75b6cb89b9-rdxlb is in CrashLoopBackOff. Alembic auto-upgrade on startup fails with:

ERROR [alembic.util.messaging] Can't locate revision identified by '018'
FAILED: Can't locate revision identified by '018'

The prod database alembic_version table contains 018. The migration chain forks at 018 into two branches:

  • 018019 (player_teams_junction)
  • 018020 (custom_notes) → 021 (oauth_tokens)

Both 019 and 020 have down_revision = "018". Merge migration 022_merge_heads.py has down_revision = ("019", "021") to rejoin them. But alembic can't resolve the fork because the DB only has a single head stamp at 018 — it doesn't know which branch to follow.

Repro Steps

  1. Deploy basketball-api with current origin/main code
  2. Pod starts, runs alembic upgrade head
  3. Alembic reads alembic_version = 018, tries to find next revision
  4. Finds two candidates (019 and 020 both descend from 018)
  5. Errors: Can't locate revision identified by '018'
  6. Pod crashes, enters CrashLoopBackOff

Expected Behavior

Alembic should upgrade from 018 through 019, 020, 021, 022 (merge), 023 (jersey backfill) cleanly.

Environment

  • Cluster/namespace: prod / basketball-api
  • Pod: basketball-api-75b6cb89b9-rdxlb (CrashLoopBackOff)
  • DB alembic_version: 018
  • Previous working pod: basketball-api-5b64ff9cf5-2ncc6 (terminated by ArgoCD sync)

File Targets

  • alembic/versions/019_player_teams_junction.pydown_revision = "018" (branch 1)
  • alembic/versions/020_add_custom_notes_to_player.pydown_revision = "018" (branch 2, creates fork)
  • alembic/versions/022_merge_heads.pydown_revision = ("019", "021") (merge point)
  • App startup script (Dockerfile CMD or entrypoint) — where alembic upgrade head runs

Test Expectations

Run: cd ~/basketball-api && python -m pytest tests/ -v — all 555 tests pass (migration chain is a prod-only issue, tests use fresh DB)

Validation after fix:

  • kubectl exec -n basketball-api <new-pod> -- alembic upgrade head succeeds
  • SELECT version_num FROM alembic_version; returns 023
  • Pod reaches 1/1 Running without CrashLoopBackOff

Acceptance Criteria

  • Alembic migration chain resolves cleanly from 018 to 023
  • Pod starts without CrashLoopBackOff
  • alembic_version table shows 023 after successful upgrade
  • No data loss — all existing tables and rows intact
  • All 555 tests still pass

Constraints

  • App is down — this is a live outage. Previous pod was terminated by ArgoCD sync.
  • The fork at 018 is the root issue. Fix options:
    1. Make the chain linear: change 020 to down_revision = "019" (simplest, but rewrites a migration already applied elsewhere?)
    2. Manually stamp the DB: alembic stamp 019 021 to tell alembic both branches are at their heads, then alembic upgrade head runs 022 + 023
    3. Fix the merge migration to handle single-stamp-at-fork-point
  • Option 2 (manual stamp) is the safest for prod since it doesn't change migration files — but the startup script needs to handle it automatically
  • Verify whether migrations 019, 020, 021 have actually been applied to the prod DB (tables/columns may already exist from a previous deploy)
  • #170 — jersey sync fix (includes migration 023, blocked by this)
  • #173 — teams/save fix (code change only, also blocked by this deploy)
  • #171 — Baby Betty data fix (blocked by deploy)
### Type Bug ### Lineage standalone — discovered during deploy of PRs #172 + #174. App is crash-looping in prod. ### Repo `forgejo_admin/basketball-api` ### What Broke basketball-api pod `basketball-api-75b6cb89b9-rdxlb` is in CrashLoopBackOff. Alembic auto-upgrade on startup fails with: ``` ERROR [alembic.util.messaging] Can't locate revision identified by '018' FAILED: Can't locate revision identified by '018' ``` The prod database `alembic_version` table contains `018`. The migration chain forks at `018` into two branches: - `018` → `019` (player_teams_junction) - `018` → `020` (custom_notes) → `021` (oauth_tokens) Both `019` and `020` have `down_revision = "018"`. Merge migration `022_merge_heads.py` has `down_revision = ("019", "021")` to rejoin them. But alembic can't resolve the fork because the DB only has a single head stamp at `018` — it doesn't know which branch to follow. ### Repro Steps 1. Deploy basketball-api with current `origin/main` code 2. Pod starts, runs `alembic upgrade head` 3. Alembic reads `alembic_version = 018`, tries to find next revision 4. Finds two candidates (`019` and `020` both descend from `018`) 5. Errors: `Can't locate revision identified by '018'` 6. Pod crashes, enters CrashLoopBackOff ### Expected Behavior Alembic should upgrade from `018` through `019`, `020`, `021`, `022` (merge), `023` (jersey backfill) cleanly. ### Environment - Cluster/namespace: prod / `basketball-api` - Pod: `basketball-api-75b6cb89b9-rdxlb` (CrashLoopBackOff) - DB alembic_version: `018` - Previous working pod: `basketball-api-5b64ff9cf5-2ncc6` (terminated by ArgoCD sync) ### File Targets - `alembic/versions/019_player_teams_junction.py` — `down_revision = "018"` (branch 1) - `alembic/versions/020_add_custom_notes_to_player.py` — `down_revision = "018"` (branch 2, creates fork) - `alembic/versions/022_merge_heads.py` — `down_revision = ("019", "021")` (merge point) - App startup script (Dockerfile `CMD` or entrypoint) — where `alembic upgrade head` runs ### Test Expectations Run: `cd ~/basketball-api && python -m pytest tests/ -v` — all 555 tests pass (migration chain is a prod-only issue, tests use fresh DB) Validation after fix: - `kubectl exec -n basketball-api <new-pod> -- alembic upgrade head` succeeds - `SELECT version_num FROM alembic_version;` returns `023` - Pod reaches `1/1 Running` without CrashLoopBackOff ### Acceptance Criteria - [ ] Alembic migration chain resolves cleanly from `018` to `023` - [ ] Pod starts without CrashLoopBackOff - [ ] `alembic_version` table shows `023` after successful upgrade - [ ] No data loss — all existing tables and rows intact - [ ] All 555 tests still pass ### Constraints - **App is down** — this is a live outage. Previous pod was terminated by ArgoCD sync. - The fork at `018` is the root issue. Fix options: 1. Make the chain linear: change `020` to `down_revision = "019"` (simplest, but rewrites a migration already applied elsewhere?) 2. Manually stamp the DB: `alembic stamp 019 021` to tell alembic both branches are at their heads, then `alembic upgrade head` runs `022` + `023` 3. Fix the merge migration to handle single-stamp-at-fork-point - Option 2 (manual stamp) is the safest for prod since it doesn't change migration files — but the startup script needs to handle it automatically - Verify whether migrations `019`, `020`, `021` have actually been applied to the prod DB (tables/columns may already exist from a previous deploy) ### Related - `#170` — jersey sync fix (includes migration 023, blocked by this) - `#173` — teams/save fix (code change only, also blocked by this deploy) - `#171` — Baby Betty data fix (blocked by deploy)
Author
Owner

Scope Review: NEEDS_REFINEMENT

Review note: review-433-2026-03-26

Ticket is nearly agent-ready but has two inaccuracies that would cause the agent to validate against wrong targets:

  • Stale head reference: Acceptance criteria say alembic_version should show 023, but actual head on main is 025 (024_add_is_public_to_players + 025_add_coach_public_fields exist). Update all references from 023 to 025.
  • Orphan migration not addressed: e09c9e678004_add_division_column_to_players.py (down_revision="005") creates a second branch. Agent should delete it or ticket should explicitly scope it out. Otherwise alembic heads will still show multiple heads after the fork fix.
## Scope Review: NEEDS_REFINEMENT Review note: `review-433-2026-03-26` Ticket is nearly agent-ready but has two inaccuracies that would cause the agent to validate against wrong targets: - **Stale head reference**: Acceptance criteria say `alembic_version` should show `023`, but actual head on main is `025` (024_add_is_public_to_players + 025_add_coach_public_fields exist). Update all references from 023 to 025. - **Orphan migration not addressed**: `e09c9e678004_add_division_column_to_players.py` (down_revision="005") creates a second branch. Agent should delete it or ticket should explicitly scope it out. Otherwise `alembic heads` will still show multiple heads after the fork fix.
Author
Owner

Refinement (post review-433-2026-03-26)

Fix 1: Stale head reference

Migrations 024 and 025 were merged by the other session (insecure-registry pipeline fixes). Updated:

  • AC "alembic_version shows 023" → "alembic_version shows current head (025 or latest)"
  • Validation query updated accordingly

Fix 2: Orphan migration — in scope

e09c9e678004_add_division_column_to_players.py has down_revision = "005" — a stale auto-generated migration creating a second detached branch. This must also be resolved or alembic heads will still show multiple heads after fixing the 018 fork.

Added to scope:

  • File Target: alembic/versions/e09c9e678004_add_division_column_to_players.py — delete or integrate into chain
  • AC: alembic heads shows exactly 1 head after fix

Updated Acceptance Criteria

  • Alembic migration chain resolves cleanly from 018 to current head
  • alembic heads shows exactly 1 head (no forks, no orphans)
  • Pod starts without CrashLoopBackOff
  • alembic_version table shows current head after successful upgrade
  • No data loss — all existing tables and rows intact
  • All tests still pass
## Refinement (post review-433-2026-03-26) ### Fix 1: Stale head reference Migrations 024 and 025 were merged by the other session (insecure-registry pipeline fixes). Updated: - AC "alembic_version shows 023" → "alembic_version shows current head (025 or latest)" - Validation query updated accordingly ### Fix 2: Orphan migration — in scope `e09c9e678004_add_division_column_to_players.py` has `down_revision = "005"` — a stale auto-generated migration creating a second detached branch. This must also be resolved or `alembic heads` will still show multiple heads after fixing the 018 fork. **Added to scope:** - File Target: `alembic/versions/e09c9e678004_add_division_column_to_players.py` — delete or integrate into chain - AC: `alembic heads` shows exactly 1 head after fix ### Updated Acceptance Criteria - [ ] Alembic migration chain resolves cleanly from `018` to current head - [ ] `alembic heads` shows exactly 1 head (no forks, no orphans) - [ ] Pod starts without CrashLoopBackOff - [ ] `alembic_version` table shows current head after successful upgrade - [ ] No data loss — all existing tables and rows intact - [ ] All tests still pass
Author
Owner

Scope Review (R2): NEEDS_REFINEMENT

Review note: review-433-2026-03-26-r2

Refinement Fix 1 (stale head reference 023→current head) is correct and resolved.

Refinement Fix 2 is WRONG and dangerous. e09c9e678004_add_division_column_to_players.py is NOT an orphan — migration 007_add_email_log_table.py has down_revision = "e09c9e678004". It is part of the main chain: 005 → e09c9e678004 → 007 → 008 → ... → 018. Chain analysis confirms exactly 1 head (025) and only 1 fork (at 018). Deleting this migration would break the entire chain from 007 onward and destroy 70+ division-column references across the codebase.

Actions needed:

  • Remove e09c9e678004_add_division_column_to_players.py from file targets and scope
  • Remove or reword AC "alembic heads shows exactly 1 head" — already true, misleading. Suggest: "alembic upgrade head succeeds from DB stamp 018 without manual intervention"
  • Keep all other ACs and core fix scope (resolving the 018 fork)
## Scope Review (R2): NEEDS_REFINEMENT Review note: `review-433-2026-03-26-r2` Refinement Fix 1 (stale head reference 023→current head) is correct and resolved. **Refinement Fix 2 is WRONG and dangerous.** `e09c9e678004_add_division_column_to_players.py` is NOT an orphan — migration `007_add_email_log_table.py` has `down_revision = "e09c9e678004"`. It is part of the main chain: `005 → e09c9e678004 → 007 → 008 → ... → 018`. Chain analysis confirms exactly 1 head (025) and only 1 fork (at 018). Deleting this migration would break the entire chain from 007 onward and destroy 70+ division-column references across the codebase. **Actions needed:** - Remove `e09c9e678004_add_division_column_to_players.py` from file targets and scope - Remove or reword AC "alembic heads shows exactly 1 head" — already true, misleading. Suggest: "alembic upgrade head succeeds from DB stamp 018 without manual intervention" - Keep all other ACs and core fix scope (resolving the 018 fork)
Author
Owner

Refinement v2 (post review-433-2026-03-26-r2)

Correction: e09c9e678004 is NOT an orphan

e09c9e678004_add_division_column_to_players.py is part of the main chain (005 → e09c9e678004 → 007). Migration 007 depends on it. Removed from scope. Do NOT delete this file.

Scope narrowed

The only issue is the fork at 018 where both 019 and 020 have down_revision = "018". The merge migration 022 exists but alembic can't traverse a fork from a single DB stamp.

Updated File Targets (final)

  • alembic/versions/019_player_teams_junction.pydown_revision = "018" (branch 1)
  • alembic/versions/020_add_custom_notes_to_player.pydown_revision = "018" (branch 2, the fork)
  • alembic/versions/022_merge_heads.pydown_revision = ("019", "021") (merge point)

Updated Acceptance Criteria (final)

  • Alembic migration chain resolves cleanly from 018 to current head
  • Pod starts without CrashLoopBackOff
  • alembic_version table shows current head after successful upgrade
  • No data loss — all existing tables and rows intact
  • All tests still pass
## Refinement v2 (post review-433-2026-03-26-r2) ### Correction: e09c9e678004 is NOT an orphan `e09c9e678004_add_division_column_to_players.py` is part of the main chain (`005 → e09c9e678004 → 007`). Migration `007` depends on it. **Removed from scope.** Do NOT delete this file. ### Scope narrowed The only issue is the fork at `018` where both `019` and `020` have `down_revision = "018"`. The merge migration `022` exists but alembic can't traverse a fork from a single DB stamp. ### Updated File Targets (final) - `alembic/versions/019_player_teams_junction.py` — `down_revision = "018"` (branch 1) - `alembic/versions/020_add_custom_notes_to_player.py` — `down_revision = "018"` (branch 2, the fork) - `alembic/versions/022_merge_heads.py` — `down_revision = ("019", "021")` (merge point) ### Updated Acceptance Criteria (final) - [ ] Alembic migration chain resolves cleanly from `018` to current head - [ ] Pod starts without CrashLoopBackOff - [ ] `alembic_version` table shows current head after successful upgrade - [ ] No data loss — all existing tables and rows intact - [ ] All tests still pass
Author
Owner

Scope Review (R3): READY

Review note: review-433-2026-03-26-r3

v2 refinement is correct. All three file targets verified against codebase. Full migration chain independently traced — fork at 018 confirmed as the sole issue. e09c9e678004 correctly removed from scope (007 depends on it). ACs are machine-verifiable. No blast radius concerns. Ship it.

## Scope Review (R3): READY Review note: `review-433-2026-03-26-r3` v2 refinement is correct. All three file targets verified against codebase. Full migration chain independently traced — fork at 018 confirmed as the sole issue. e09c9e678004 correctly removed from scope (007 depends on it). ACs are machine-verifiable. No blast radius concerns. Ship it.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
forgejo_admin/basketball-api#179
No description provided.