Bug: CrashLoopBackOff — Alembic migration chain forked, DB in partial state #183

Closed
opened 2026-03-27 05:55:32 +00:00 by forgejo_admin · 1 comment

Type

Bug

Lineage

Standalone — discovered during westside landing site validation. basketball-api returns 502, blocking /teams dynamic content.

Repo

forgejo_admin/basketball-api

What Broke

basketball-api is in CrashLoopBackOff (32+ restarts). Two cascading errors:

  1. Old image (737093f, 23 commits behind) crashed with: Can't locate revision identified by '018'
  2. Latest image (15f3047) crashes with: DuplicateColumn: column "custom_notes" of relation "players" already exists

Root cause: the migration chain was forked during development. Migrations ran out of order, then 022_merge_heads.py linearized the chain, but the DB never ran through the fixed chain.

DB State (partial migration)

Migration Change Present?
018 groupme + outbox Yes (Alembic stamp)
019 player_teams junction table MISSING
020 custom_notes column EXISTS (ran out of order)
021 oauth_tokens table MISSING
022 merge_heads N/A (chain fix)
023 backfill jersey from orders Unknown
024 is_public column on players MISSING

Repro Steps

  1. kubectl get pods -n basketball-api → CrashLoopBackOff
  2. kubectl logs deploy/basketball-api -n basketball-api → DuplicateColumn error
  3. psql -U basketball -d basketball -c "SELECT version_num FROM alembic_version"018

Expected Behavior

App boots, runs migrations forward cleanly, serves /public/teams endpoint.

Environment

  • Cluster/namespace: prod / basketball-api
  • Deployed image: 15f3047 (latest main)
  • DB: standalone postgres pod in basketball-api namespace
  • Alembic stamp: 018

Fix Plan

  1. Stamp Alembic past the duplicate column: alembic stamp 020
  2. Run remaining migrations: alembic upgrade head (applies 021, 022, 023, 024)
  3. Verify app boots and /public/teams returns data
  4. Verify /health endpoint returns 200

Reference: sop-db-migration-recovery

Acceptance Criteria

  • basketball-api pod is Running (not CrashLoopBackOff)
  • curl https://basketball-api.tail5b443a.ts.net/health returns 200
  • curl https://basketball-api.tail5b443a.ts.net/public/teams returns team data
  • westsidekingsandqueens.tail5b443a.ts.net/teams shows real team rosters
  • Alembic version stamped at 024
  • project-westside-basketball
  • ArgoCD image tag drift — deployed image was 23 commits behind main
### Type Bug ### Lineage Standalone — discovered during westside landing site validation. basketball-api returns 502, blocking /teams dynamic content. ### Repo `forgejo_admin/basketball-api` ### What Broke basketball-api is in CrashLoopBackOff (32+ restarts). Two cascading errors: 1. Old image (`737093f`, 23 commits behind) crashed with: `Can't locate revision identified by '018'` 2. Latest image (`15f3047`) crashes with: `DuplicateColumn: column "custom_notes" of relation "players" already exists` Root cause: the migration chain was forked during development. Migrations ran out of order, then `022_merge_heads.py` linearized the chain, but the DB never ran through the fixed chain. ### DB State (partial migration) | Migration | Change | Present? | |-----------|--------|----------| | 018 | groupme + outbox | Yes (Alembic stamp) | | 019 | player_teams junction table | **MISSING** | | 020 | custom_notes column | EXISTS (ran out of order) | | 021 | oauth_tokens table | **MISSING** | | 022 | merge_heads | N/A (chain fix) | | 023 | backfill jersey from orders | Unknown | | 024 | is_public column on players | **MISSING** | ### Repro Steps 1. `kubectl get pods -n basketball-api` → CrashLoopBackOff 2. `kubectl logs deploy/basketball-api -n basketball-api` → DuplicateColumn error 3. `psql -U basketball -d basketball -c "SELECT version_num FROM alembic_version"` → `018` ### Expected Behavior App boots, runs migrations forward cleanly, serves /public/teams endpoint. ### Environment - Cluster/namespace: prod / basketball-api - Deployed image: `15f3047` (latest main) - DB: standalone postgres pod in basketball-api namespace - Alembic stamp: `018` ### Fix Plan 1. Stamp Alembic past the duplicate column: `alembic stamp 020` 2. Run remaining migrations: `alembic upgrade head` (applies 021, 022, 023, 024) 3. Verify app boots and /public/teams returns data 4. Verify /health endpoint returns 200 Reference: `sop-db-migration-recovery` ### Acceptance Criteria - [ ] basketball-api pod is Running (not CrashLoopBackOff) - [ ] `curl https://basketball-api.tail5b443a.ts.net/health` returns 200 - [ ] `curl https://basketball-api.tail5b443a.ts.net/public/teams` returns team data - [ ] westsidekingsandqueens.tail5b443a.ts.net/teams shows real team rosters - [ ] Alembic version stamped at `024` ### Related - `project-westside-basketball` - ArgoCD image tag drift — deployed image was 23 commits behind main
Author
Owner

Scope Review: NEEDS_REFINEMENT

Review note: review-445-2026-03-26
Issue scope is high quality (detailed root cause, DB state audit, concrete fix plan, verifiable acceptance criteria, valid SOP reference) but board/traceability hygiene needs attention before moving to next_up.

  • Duplicate overlap: Forgejo #184 (closed) covers the same root cause. Board item #449 is still in in_progress for #184. Clarify whether #183 restates remaining work or is a duplicate.
  • Missing story label: Board item #445 has no story:X label. Likely story:WS-S26 based on westside landing site context.
  • Null board title: Board item #445 title is null — needs sync from Forgejo issue title.
## Scope Review: NEEDS_REFINEMENT Review note: `review-445-2026-03-26` Issue scope is high quality (detailed root cause, DB state audit, concrete fix plan, verifiable acceptance criteria, valid SOP reference) but board/traceability hygiene needs attention before moving to next_up. - **Duplicate overlap:** Forgejo #184 (closed) covers the same root cause. Board item #449 is still in `in_progress` for #184. Clarify whether #183 restates remaining work or is a duplicate. - **Missing story label:** Board item #445 has no `story:X` label. Likely `story:WS-S26` based on westside landing site context. - **Null board title:** Board item #445 title is null — needs sync from Forgejo issue title.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
forgejo_admin/basketball-api#183
No description provided.