ldraney/pal-e-api

Archived

Fork 0

Add partial indexes: blocks(block_type)='mermaid' and notes(note_type)='architecture' #252

New issue

Closed

opened 2026-04-10 21:50:48 +00:00 by forgejo_admin · 2 comments

forgejo_admin commented

2026-04-10 21:50:48 +00:00

Contributor

Type

Feature

Lineage

Standalone — discovered during 2026-04-10 architecture-template gate review. Mermaid + architecture notes identified as first-class query paths currently unindexed. Motivation is semantic self-documentation, not performance — at 23K blocks / 900 notes sequential scans are <5ms.

Scope review round 1: NEEDS_REFINEMENT (review-944-2026-04-10). Fixed in this revision: (1) backing arch note created (arch-domain-pal-e-docs), (2) arch label canonical spelling established, (3) EXPLAIN verification ACs made deterministic via SET enable_seqscan = off.

Repo

forgejo_admin/pal-e-api

(Note: the Python package inside the repo is named pal_e_docs as an artifact of the pal-e-docs→pal-e-api rename. models.py lives at src/pal_e_docs/models.py, Alembic at alembic/.)

User Story

As the Superuser (Lucas)
I want semantic-level "first-class" declaration of the mermaid-block and architecture-note query paths via partial indexes
So that the schema self-documents which lookups matter, and future scale degrades gracefully when the knowledge base grows

Traces to: story:superuser-query on project-pal-e-docs.

Architecture

arch:blocks — the blocks component in arch-domain-pal-e-docs
arch:notes — the notes component in arch-domain-pal-e-docs

(Canonical spelling is arch:notes — maps to the notes DB table row in the Components table. The sibling ticket pal-e-api#251 uses arch:notes-api which refers to a different component at a different abstraction layer — API routes, not DB entities. Not a conflict.)

Context

pal-e-docs has two query paths that currently sequential-scan:

"Show me every mermaid diagram" — scans all 23,657 blocks to find the 25 mermaid ones.
list_notes(note_type='architecture') — scans all ~900 notes to find the architecture subset.

At current scale (~5ms scans) these are not performance problems. The reason to add partial indexes is semantic documentation: partial indexes are the cheapest way to declare "this is a first-class query path." They self-document the schema and future-proof the query paths before they become hot.

The architecture template prescribes 3 diagram notes per project. With ~15 projects that's ~45 expected architecture notes and ~45 expected mermaid blocks. Current state: 25 mermaid blocks across 8 existing triplets. The indexes make the "find all architecture / find all diagrams" queries first-class as the triplet fills out.

Partial indexes (not full btree on block_type/note_type) because:

Only mermaid and architecture are the query-worthy values
Partial indexes cost near-zero space (~25 rows for mermaid, ~25 rows for architecture)
Full btree on block_type would waste space on 9634 heading entries we never filter to

File Targets

Files to create:

alembic/versions/<next>_add_mermaid_and_architecture_partial_indexes.py — new Alembic migration

The new migration's down_revision must chain to the current Alembic head (verify with alembic heads before writing the file).

Files the agent should NOT touch:

src/pal_e_docs/models.py — do NOT add Index(...) declarations in __table_args__ for partial indexes; SQLAlchemy's native Index doesn't express WHERE clauses cleanly. Keep the indexes migration-only and document them in a brief comment above the affected model class.
Any application code — this is a pure DDL change.

Acceptance Criteria

Alembic migration creates ix_blocks_block_type_mermaid as a partial btree: CREATE INDEX ix_blocks_block_type_mermaid ON blocks (block_type) WHERE block_type = 'mermaid'
Alembic migration creates ix_notes_note_type_architecture as a partial btree: CREATE INDEX ix_notes_note_type_architecture ON notes (note_type) WHERE note_type = 'architecture'
Downgrade drops both indexes
down_revision chains correctly to current head (verified via alembic heads)
With SET enable_seqscan = off;, EXPLAIN SELECT * FROM blocks WHERE block_type = 'mermaid' shows Index Scan using ix_blocks_block_type_mermaid
With SET enable_seqscan = off;, EXPLAIN SELECT * FROM notes WHERE note_type = 'architecture' shows Index Scan using ix_notes_note_type_architecture
Migration applies cleanly against current prod schema (test in dev overlay first)

Why enable_seqscan = off: at current scale (25 mermaid blocks, 25 architecture notes) the PostgreSQL planner may legitimately prefer Seq Scan — the row counts are too small for the index to win on cost. Disabling seqscan forces the planner to use the partial index if it's valid, which is the deterministic way to verify the index is wired correctly regardless of data volume. The EXPLAIN output is captured in the PR description as the smoking-gun evidence the index exists and is query-planner-visible.

Test Expectations

Migration upgrade + downgrade round-trip in a fresh test DB
EXPLAIN ANALYZE output (with enable_seqscan = off) captured in PR description showing each partial index is used
Run: alembic upgrade head && alembic downgrade -1 && alembic upgrade head

Constraints

Use op.create_index with postgresql_where=sa.text("...") — the idiomatic Alembic partial-index pattern
Index names must match existing convention (ix_<table>_<columns> prefix — see ix_blocks_note_id_position, ix_blocks_anchor_id)
Do NOT migrate blocks.content from json to jsonb in this ticket — that's separate scope (would enable GIN indexes on diagram definitions but is a bigger migration)

Checklist

PR opened
Migration tested up/down in dev
EXPLAIN plans (with enable_seqscan = off) captured in PR description
No unrelated changes

project-pal-e-docs — affected project
arch-domain-pal-e-docs — backing architecture note (new)
template-architecture — prescribes the diagram triplet this indexes

### Type Feature ### Lineage Standalone — discovered during 2026-04-10 architecture-template gate review. Mermaid + architecture notes identified as first-class query paths currently unindexed. Motivation is semantic self-documentation, not performance — at 23K blocks / 900 notes sequential scans are <5ms. **Scope review round 1:** NEEDS_REFINEMENT (review-944-2026-04-10). Fixed in this revision: (1) backing arch note created (`arch-domain-pal-e-docs`), (2) arch label canonical spelling established, (3) EXPLAIN verification ACs made deterministic via `SET enable_seqscan = off`. ### Repo `forgejo_admin/pal-e-api` (Note: the Python package inside the repo is named `pal_e_docs` as an artifact of the pal-e-docs→pal-e-api rename. `models.py` lives at `src/pal_e_docs/models.py`, Alembic at `alembic/`.) ### User Story As the Superuser (Lucas) I want semantic-level "first-class" declaration of the mermaid-block and architecture-note query paths via partial indexes So that the schema self-documents which lookups matter, and future scale degrades gracefully when the knowledge base grows Traces to: `story:superuser-query` on `project-pal-e-docs`. ### Architecture - `arch:blocks` — the `blocks` component in `arch-domain-pal-e-docs` - `arch:notes` — the `notes` component in `arch-domain-pal-e-docs` (Canonical spelling is `arch:notes` — maps to the `notes` DB table row in the Components table. The sibling ticket pal-e-api#251 uses `arch:notes-api` which refers to a different component at a different abstraction layer — API routes, not DB entities. Not a conflict.) ### Context pal-e-docs has two query paths that currently sequential-scan: 1. **"Show me every mermaid diagram"** — scans all 23,657 blocks to find the 25 mermaid ones. 2. **`list_notes(note_type='architecture')`** — scans all ~900 notes to find the architecture subset. At current scale (~5ms scans) these are not performance problems. The reason to add partial indexes is **semantic documentation**: partial indexes are the cheapest way to declare "this is a first-class query path." They self-document the schema and future-proof the query paths before they become hot. The architecture template prescribes 3 diagram notes per project. With ~15 projects that's ~45 expected architecture notes and ~45 expected mermaid blocks. Current state: 25 mermaid blocks across 8 existing triplets. The indexes make the "find all architecture / find all diagrams" queries first-class as the triplet fills out. Partial indexes (not full btree on `block_type`/`note_type`) because: - Only `mermaid` and `architecture` are the query-worthy values - Partial indexes cost near-zero space (~25 rows for mermaid, ~25 rows for architecture) - Full btree on `block_type` would waste space on 9634 `heading` entries we never filter to ### File Targets Files to create: - `alembic/versions/<next>_add_mermaid_and_architecture_partial_indexes.py` — new Alembic migration The new migration's `down_revision` must chain to the current Alembic head (verify with `alembic heads` before writing the file). Files the agent should NOT touch: - `src/pal_e_docs/models.py` — do NOT add `Index(...)` declarations in `__table_args__` for partial indexes; SQLAlchemy's native `Index` doesn't express `WHERE` clauses cleanly. Keep the indexes migration-only and document them in a brief comment above the affected model class. - Any application code — this is a pure DDL change. ### Acceptance Criteria - [ ] Alembic migration creates `ix_blocks_block_type_mermaid` as a partial btree: `CREATE INDEX ix_blocks_block_type_mermaid ON blocks (block_type) WHERE block_type = 'mermaid'` - [ ] Alembic migration creates `ix_notes_note_type_architecture` as a partial btree: `CREATE INDEX ix_notes_note_type_architecture ON notes (note_type) WHERE note_type = 'architecture'` - [ ] Downgrade drops both indexes - [ ] `down_revision` chains correctly to current head (verified via `alembic heads`) - [ ] With `SET enable_seqscan = off;`, `EXPLAIN SELECT * FROM blocks WHERE block_type = 'mermaid'` shows `Index Scan using ix_blocks_block_type_mermaid` - [ ] With `SET enable_seqscan = off;`, `EXPLAIN SELECT * FROM notes WHERE note_type = 'architecture'` shows `Index Scan using ix_notes_note_type_architecture` - [ ] Migration applies cleanly against current prod schema (test in dev overlay first) **Why `enable_seqscan = off`:** at current scale (25 mermaid blocks, 25 architecture notes) the PostgreSQL planner may legitimately prefer Seq Scan — the row counts are too small for the index to win on cost. Disabling seqscan forces the planner to use the partial index if it's valid, which is the deterministic way to verify the index is wired correctly regardless of data volume. The `EXPLAIN` output is captured in the PR description as the smoking-gun evidence the index exists and is query-planner-visible. ### Test Expectations - [ ] Migration upgrade + downgrade round-trip in a fresh test DB - [ ] `EXPLAIN ANALYZE` output (with `enable_seqscan = off`) captured in PR description showing each partial index is used - Run: `alembic upgrade head && alembic downgrade -1 && alembic upgrade head` ### Constraints - Use `op.create_index` with `postgresql_where=sa.text("...")` — the idiomatic Alembic partial-index pattern - Index names must match existing convention (`ix_<table>_<columns>` prefix — see `ix_blocks_note_id_position`, `ix_blocks_anchor_id`) - Do NOT migrate `blocks.content` from `json` to `jsonb` in this ticket — that's separate scope (would enable GIN indexes on diagram definitions but is a bigger migration) ### Checklist - [ ] PR opened - [ ] Migration tested up/down in dev - [ ] EXPLAIN plans (with `enable_seqscan = off`) captured in PR description - [ ] No unrelated changes ### Related - `project-pal-e-docs` — affected project - `arch-domain-pal-e-docs` — backing architecture note (new) - `template-architecture` — prescribes the diagram triplet this indexes

forgejo_admin commented

2026-04-10 21:57:31 +00:00

Author

Contributor

Scope Review: NEEDS_REFINEMENT

Review note: review-944-2026-04-10

Scope is technically solid and executable in a single agent pass (~3 min, well under the 5-minute rule). File targets verified on Forgejo — alembic lives in pal-e-api, current head is t0o1p2q3r4s5, and models.py at src/pal_e_docs/models.py confirms both block_type and note_type columns plus the existing ix_<table>_<columns> naming convention the ticket prescribes. Repo placement is correct (the task brief's claim that models.py is in ~/pal-e-docs was a false alarm — the Python package is named pal_e_docs inside the pal-e-api repo, an artifact of the #217 rename).

Blocking issues:

[SCOPE] No arch-blocks backing note exists in pal-e-docs. Create before moving to todo.
[SCOPE] No arch-notes backing note exists in pal-e-docs. Create before moving to todo.
[LABEL] Reconcile arch label spelling: this ticket uses arch:notes, sibling backlog ticket #908 uses arch:notes-api. Decide canonical and update.
[BODY] Add a sentence to AC noting that at 900 architecture notes the planner may prefer Seq Scan; agent should use SET enable_seqscan = off to confirm the partial index is usable. Otherwise AC #4 and #5 may fail non-deterministically.

Once the two arch notes exist and the label is reconciled, this ticket is READY.

## Scope Review: NEEDS_REFINEMENT Review note: `review-944-2026-04-10` Scope is technically solid and executable in a single agent pass (~3 min, well under the 5-minute rule). File targets verified on Forgejo — alembic lives in `pal-e-api`, current head is `t0o1p2q3r4s5`, and `models.py` at `src/pal_e_docs/models.py` confirms both `block_type` and `note_type` columns plus the existing `ix_<table>_<columns>` naming convention the ticket prescribes. Repo placement is correct (the task brief's claim that models.py is in `~/pal-e-docs` was a false alarm — the Python package is named `pal_e_docs` inside the `pal-e-api` repo, an artifact of the #217 rename). Blocking issues: - **[SCOPE]** No `arch-blocks` backing note exists in pal-e-docs. Create before moving to todo. - **[SCOPE]** No `arch-notes` backing note exists in pal-e-docs. Create before moving to todo. - **[LABEL]** Reconcile arch label spelling: this ticket uses `arch:notes`, sibling backlog ticket #908 uses `arch:notes-api`. Decide canonical and update. - **[BODY]** Add a sentence to AC noting that at 900 architecture notes the planner may prefer Seq Scan; agent should use `SET enable_seqscan = off` to confirm the partial index is usable. Otherwise AC #4 and #5 may fail non-deterministically. Once the two arch notes exist and the label is reconciled, this ticket is READY.

forgejo_admin commented

2026-04-10 22:08:06 +00:00

Author

Contributor

Scope Review: APPROVED (round 2)

Review note: review-944-2026-04-10-r2

All four round-1 blockers resolved. Ticket is READY to move backlog → todo.

[SCOPE] arch-blocks backing note → arch-domain-pal-e-docs Components table has blocks row
[SCOPE] arch-notes backing note → same note has notes row (forward-references this ticket)
[LABEL] arch:notes vs arch:notes-api → verified different components at different layers (DB entity vs API routes for landing-site). Not a conflict; documented in ticket body.
[BODY] Non-deterministic EXPLAIN ACs → now prescribe SET enable_seqscan = off with rationale

Alembic head re-verified: t0o1p2q3r4s5_drop_legacy_boards_table.py. Story story:superuser-query verified in project-pal-e-docs user-stories. 7 ACs — over the 5-AC caution threshold but tightly coupled to one migration file; no decomposition needed.

No action needed. Ready for dev dispatch.

## Scope Review: APPROVED (round 2) Review note: `review-944-2026-04-10-r2` All four round-1 blockers resolved. Ticket is READY to move backlog → todo. - [x] [SCOPE] arch-blocks backing note → `arch-domain-pal-e-docs` Components table has `blocks` row - [x] [SCOPE] arch-notes backing note → same note has `notes` row (forward-references this ticket) - [x] [LABEL] arch:notes vs arch:notes-api → verified different components at different layers (DB entity vs API routes for landing-site). Not a conflict; documented in ticket body. - [x] [BODY] Non-deterministic EXPLAIN ACs → now prescribe `SET enable_seqscan = off` with rationale Alembic head re-verified: `t0o1p2q3r4s5_drop_legacy_boards_table.py`. Story `story:superuser-query` verified in `project-pal-e-docs` user-stories. 7 ACs — over the 5-AC caution threshold but tightly coupled to one migration file; no decomposition needed. No action needed. Ready for dev dispatch.