INCIDENT: Drop HNSW index before vector dimension ALTER #159

Merged
forgejo_admin merged 1 commit from 158-fix-migration-drop-hnsw-index into main 2026-03-14 17:19:58 +00:00

Summary

Fixes CrashLoopBackOff on both pal-e-docs pods. The Alembic migration from PR #157 tried to ALTER the embedding column to vector(2560), but the existing HNSW index has a hard 2000-dimension limit in pgvector. The fix drops the index before the ALTER.

Changes

  • alembic/versions/o5j6k7l8m9n0_fix_embedding_vector_dimension.py -- Added DROP INDEX IF EXISTS ix_blocks_embedding before the ALTER TABLE statement. Downgrade restores the HNSW index on vector(768) for reversibility. No index recreation in upgrade (sequential scan is fine for ~5600 blocks).

Test Plan

  • pytest tests/ -v -- 497 passed
  • ruff check + ruff format --check -- clean
  • After deploy: verify pods are Running (no CrashLoopBackOff)
  • After deploy: verify embedding worker starts processing blocks

Review Checklist

  • No unrelated changes
  • Migration is idempotent (DROP INDEX IF EXISTS)
  • Downgrade is symmetric (restores HNSW index on vector(768))
  • Reviewer: confirm no other indexes reference the embedding column
  • Plan: plan-pal-e-docs
  • Forgejo issue: #158

Closes #158

## Summary Fixes CrashLoopBackOff on both pal-e-docs pods. The Alembic migration from PR #157 tried to ALTER the embedding column to vector(2560), but the existing HNSW index has a hard 2000-dimension limit in pgvector. The fix drops the index before the ALTER. ## Changes - `alembic/versions/o5j6k7l8m9n0_fix_embedding_vector_dimension.py` -- Added `DROP INDEX IF EXISTS ix_blocks_embedding` before the ALTER TABLE statement. Downgrade restores the HNSW index on vector(768) for reversibility. No index recreation in upgrade (sequential scan is fine for ~5600 blocks). ## Test Plan - [x] `pytest tests/ -v` -- 497 passed - [x] `ruff check` + `ruff format --check` -- clean - [ ] After deploy: verify pods are Running (no CrashLoopBackOff) - [ ] After deploy: verify embedding worker starts processing blocks ## Review Checklist - [x] No unrelated changes - [x] Migration is idempotent (DROP INDEX IF EXISTS) - [x] Downgrade is symmetric (restores HNSW index on vector(768)) - [ ] Reviewer: confirm no other indexes reference the embedding column ## Related - Plan: `plan-pal-e-docs` - Forgejo issue: #158 Closes #158
Fix migration crash: drop HNSW index before vector dimension ALTER
All checks were successful
ci/woodpecker/pr/woodpecker Pipeline was successful
568b0eb21c
The HNSW index has a hard 2000-dimension limit in pgvector. The migration
to vector(2560) was failing because it tried to ALTER the column type
without first dropping the index. This caused CrashLoopBackOff on both
pal-e-docs pods since Alembic runs on startup.

- Drop ix_blocks_embedding before ALTER TABLE
- Do not recreate index (sequential scan is fine for ~5600 blocks)
- Downgrade restores the HNSW index on vector(768) for reversibility

Closes #158

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Author
Owner

PR #159 Review

BLOCKERS

None.

CODE ANALYSIS

Upgrade path (correct):

  1. DROP INDEX IF EXISTS ix_blocks_embedding -- removes the HNSW index that has a hard 2000-dimension limit in pgvector
  2. ALTER TABLE blocks ALTER COLUMN embedding TYPE vector(2560) -- now succeeds without the index blocking it
  3. UPDATE blocks SET embedding_status = 'pending' WHERE embedding_status = 'error' -- requeues previously-failed blocks

The ordering is correct. The IF EXISTS guard makes the migration idempotent (safe to re-run if the first attempt partially succeeded). No index recreation is intentional -- sequential scan on ~5,600 blocks is fine, and HNSW cannot support 2560 dimensions anyway.

Downgrade path (correct):

  1. ALTER TABLE blocks ALTER COLUMN embedding TYPE vector(768) -- shrinks column first (will fail if 2560-dim data exists, documented in comment)
  2. CREATE INDEX ix_blocks_embedding ON blocks USING hnsw (embedding vector_cosine_ops) -- recreates index on 768-dim column (within 2000-dim limit)

Ordering is correct: column must be 768-dim before the HNSW index can be created on it.

Cross-reference verified:

  • The original HNSW index was created in migration l2g3h4i5j6k7_add_vector_embeddings.py (line 47-49). That is the only CREATE INDEX on the embedding column in the migration history. No other indexes reference the embedding column.
  • The SQLAlchemy model at /home/ldraney/pal-e-docs/src/pal_e_docs/models.py:207 already has Vector(2560), confirming the model matches the migration target.
  • down_revision points to n4i5j6k7l8m9 (the boards migration), which is the correct parent in the chain.

NITS

None. This is a clean, minimal incident fix.

SOP COMPLIANCE

  • Branch named after issue: 158-fix-migration-drop-hnsw-index references issue #158
  • PR body follows template: Summary, Changes, Test Plan, Review Checklist, Related sections all present
  • Related references plan slug: plan-pal-e-docs
  • Closes #158 present in PR body
  • No secrets committed
  • No scope creep: 1 file changed, +10/-0 lines, all directly related to the HNSW index fix
  • Single migration file only -- no unrelated changes

VERDICT: APPROVED

Clean incident fix. Correct ordering, idempotent, symmetric downgrade, no scope creep. Ship it.

## PR #159 Review ### BLOCKERS None. ### CODE ANALYSIS **Upgrade path (correct):** 1. `DROP INDEX IF EXISTS ix_blocks_embedding` -- removes the HNSW index that has a hard 2000-dimension limit in pgvector 2. `ALTER TABLE blocks ALTER COLUMN embedding TYPE vector(2560)` -- now succeeds without the index blocking it 3. `UPDATE blocks SET embedding_status = 'pending' WHERE embedding_status = 'error'` -- requeues previously-failed blocks The ordering is correct. The `IF EXISTS` guard makes the migration idempotent (safe to re-run if the first attempt partially succeeded). No index recreation is intentional -- sequential scan on ~5,600 blocks is fine, and HNSW cannot support 2560 dimensions anyway. **Downgrade path (correct):** 1. `ALTER TABLE blocks ALTER COLUMN embedding TYPE vector(768)` -- shrinks column first (will fail if 2560-dim data exists, documented in comment) 2. `CREATE INDEX ix_blocks_embedding ON blocks USING hnsw (embedding vector_cosine_ops)` -- recreates index on 768-dim column (within 2000-dim limit) Ordering is correct: column must be 768-dim before the HNSW index can be created on it. **Cross-reference verified:** - The original HNSW index was created in migration `l2g3h4i5j6k7_add_vector_embeddings.py` (line 47-49). That is the only `CREATE INDEX` on the embedding column in the migration history. No other indexes reference the embedding column. - The SQLAlchemy model at `/home/ldraney/pal-e-docs/src/pal_e_docs/models.py:207` already has `Vector(2560)`, confirming the model matches the migration target. - `down_revision` points to `n4i5j6k7l8m9` (the boards migration), which is the correct parent in the chain. ### NITS None. This is a clean, minimal incident fix. ### SOP COMPLIANCE - [x] Branch named after issue: `158-fix-migration-drop-hnsw-index` references issue #158 - [x] PR body follows template: Summary, Changes, Test Plan, Review Checklist, Related sections all present - [x] Related references plan slug: `plan-pal-e-docs` - [x] `Closes #158` present in PR body - [x] No secrets committed - [x] No scope creep: 1 file changed, +10/-0 lines, all directly related to the HNSW index fix - [x] Single migration file only -- no unrelated changes ### VERDICT: APPROVED Clean incident fix. Correct ordering, idempotent, symmetric downgrade, no scope creep. Ship it.
forgejo_admin deleted branch 158-fix-migration-drop-hnsw-index 2026-03-14 17:19:58 +00:00
Sign in to join this conversation.
No description provided.