[CRITICAL] Migration 040 dual-revision collision — pod will CrashLoop on alembic upgrade #441

Closed
opened 2026-04-11 19:51:33 +00:00 by forgejo_admin · 0 comments

Type

Bug

Lineage

Discovered 2026-04-11 during post-merge validation of PR #433. Related to PR #426 (16U queens) and PR #428 (Alice dedupe) which merged migrations 040, 041, 042 between PR #433 scoping and merge. Regression from PR #433.

Repo

forgejo_admin/basketball-api

What Broke

Two files on main both declare revision = "040" with down_revision = "039":

  • alembic/versions/040_add_jersey_public_orders.py (from PR #433)
  • alembic/versions/040_create_16u_local_queens_team.py (from PR #426)

Alembic cannot resolve multiple heads at the same revision id. The running pod is still on a pre-#433 image (alembic_version = 042, jersey_public_orders table does NOT exist). ArgoCD shows Synced / Progressing for basketball-api — it is currently rolling out the new image. When the new pod starts, alembic upgrade head will fail because of the duplicate revision, the new pod will CrashLoop, and the rolling update will block (maxUnavailable=0 keeps old pod serving, but all future deploys are blocked).

Repro Steps

  1. kubectl -n basketball-api exec postgres-9b5b87b5-5nccx -- psql -U basketball -d basketball -c "SELECT version_num FROM alembic_version;"042
  2. kubectl -n basketball-api exec postgres-9b5b87b5-5nccx -- psql -U basketball -d basketball -c "\dt jersey_public_orders"Did not find any relation named "jersey_public_orders"
  3. curl -sS "$FORGEJO_URL/api/v1/repos/forgejo_admin/basketball-api/contents/alembic/versions?ref=main" → list includes both 040_add_jersey_public_orders.py and 040_create_16u_local_queens_team.py
  4. Observe: two distinct files both have revision = "040" in their headers

Expected Behavior

Exactly one file claims revision = "040". The jersey_public_orders migration should occupy a unique revision number at the end of the current chain (should be 043 since 040-042 already exist).

Environment

  • Cluster/namespace: basketball-api namespace
  • Service version/commit: pod running image harbor.tail5b443a.ts.net/basketball-api/api:7ccc4b3020797c0b59544493194de837c19441fe (pre-collision), main branch HEAD 4be0848a5df29213 (collision present)
  • Related alerts: ArgoCD application state Synced / Progressing
  • DB head (applied): alembic_version = 042
  • Git head (main): duplicate 040 files

Acceptance Criteria

  • Only one file with revision = "040" remains in alembic/versions/
  • jersey_public_orders migration renamed to 043 (or next unique revision), down_revision = "042"
  • Schema content byte-identical to original — only revision metadata changes
  • alembic heads returns a single head
  • alembic upgrade head completes cleanly through the new revision
  • alembic downgrade -1 reverses cleanly
  • jersey_public_orders table exists in basketball-api production DB after rollout
  • ArgoCD application state returns to Synced / Healthy
  • from basketball_api.models import JerseyPublicOrder imports cleanly
  • pal-e-platform — project tracking
  • westside-basketball — affected product (System B production rollout blocked)
  • forgejo_admin/basketball-api#429 — PR #433 source issue (jersey_public_orders migration)
  • PR #426 — 16U queens team migration that pre-existed as 040
  • PR #428 — Alice dedupe migration (042)
  • Follow-up: Woodpecker pipeline should run alembic heads before build to prevent future collisions
### Type Bug ### Lineage Discovered 2026-04-11 during post-merge validation of PR #433. Related to PR #426 (16U queens) and PR #428 (Alice dedupe) which merged migrations 040, 041, 042 between PR #433 scoping and merge. Regression from PR #433. ### Repo `forgejo_admin/basketball-api` ### What Broke Two files on `main` both declare `revision = "040"` with `down_revision = "039"`: - `alembic/versions/040_add_jersey_public_orders.py` (from PR #433) - `alembic/versions/040_create_16u_local_queens_team.py` (from PR #426) Alembic cannot resolve multiple heads at the same revision id. The running pod is still on a pre-#433 image (alembic_version = 042, `jersey_public_orders` table does NOT exist). ArgoCD shows `Synced / Progressing` for basketball-api — it is currently rolling out the new image. When the new pod starts, `alembic upgrade head` will fail because of the duplicate revision, the new pod will CrashLoop, and the rolling update will block (maxUnavailable=0 keeps old pod serving, but all future deploys are blocked). ### Repro Steps 1. `kubectl -n basketball-api exec postgres-9b5b87b5-5nccx -- psql -U basketball -d basketball -c "SELECT version_num FROM alembic_version;"` → `042` 2. `kubectl -n basketball-api exec postgres-9b5b87b5-5nccx -- psql -U basketball -d basketball -c "\dt jersey_public_orders"` → `Did not find any relation named "jersey_public_orders"` 3. `curl -sS "$FORGEJO_URL/api/v1/repos/forgejo_admin/basketball-api/contents/alembic/versions?ref=main"` → list includes both `040_add_jersey_public_orders.py` and `040_create_16u_local_queens_team.py` 4. Observe: two distinct files both have `revision = "040"` in their headers ### Expected Behavior Exactly one file claims `revision = "040"`. The jersey_public_orders migration should occupy a unique revision number at the end of the current chain (should be 043 since 040-042 already exist). ### Environment - Cluster/namespace: `basketball-api` namespace - Service version/commit: pod running image `harbor.tail5b443a.ts.net/basketball-api/api:7ccc4b3020797c0b59544493194de837c19441fe` (pre-collision), main branch HEAD `4be0848a5df29213` (collision present) - Related alerts: ArgoCD application state `Synced / Progressing` - DB head (applied): alembic_version = 042 - Git head (main): duplicate 040 files ### Acceptance Criteria - [ ] Only one file with `revision = "040"` remains in `alembic/versions/` - [ ] `jersey_public_orders` migration renamed to `043` (or next unique revision), `down_revision = "042"` - [ ] Schema content byte-identical to original — only revision metadata changes - [ ] `alembic heads` returns a single head - [ ] `alembic upgrade head` completes cleanly through the new revision - [ ] `alembic downgrade -1` reverses cleanly - [ ] `jersey_public_orders` table exists in basketball-api production DB after rollout - [ ] ArgoCD application state returns to `Synced / Healthy` - [ ] `from basketball_api.models import JerseyPublicOrder` imports cleanly ### Related - `pal-e-platform` — project tracking - `westside-basketball` — affected product (System B production rollout blocked) - `forgejo_admin/basketball-api#429` — PR #433 source issue (jersey_public_orders migration) - PR #426 — 16U queens team migration that pre-existed as 040 - PR #428 — Alice dedupe migration (042) - Follow-up: Woodpecker pipeline should run `alembic heads` before build to prevent future collisions
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
forgejo_admin/basketball-api#441
No description provided.