Generate anchor_ids for all block types, not just headings #119

Closed
opened 2026-03-09 03:09:30 +00:00 by forgejo_admin · 2 comments

Lineage

todo-block-anchor-ids (no plan ancestry — Epilogue item 9 on phase-postgres-epilogue-cleanup)

Repo

forgejo_admin/pal-e-docs

User Story

As an agent using the block API
I want every block to have an anchor_id
So that I can use update_block and delete_block on paragraphs, lists, tables, and code blocks — not just headings

Context

The HTML parser currently only generates anchor_id for heading blocks (by slugifying the heading text). All other block types get anchor_id: null. Since the block API endpoints (/notes/{slug}/blocks/{anchor_id}) use anchor_id as the path parameter, non-heading blocks are completely un-addressable for updates and deletes.

This was discovered in a live session when trying to update a single paragraph in a plan note — the only option was update_note() with the entire HTML content, defeating the purpose of block-level editing.

~262 notes have blocks with null anchor_ids. After the parser fix, existing blocks need a backfill migration to populate their anchor_ids.

File Targets

Files the agent should modify or create:

  • src/pal_e_docs/services/block_parser.py (or wherever the parser lives) — generate anchor_ids for non-heading blocks
  • Migration script or management command — backfill existing null anchor_ids
  • tests/ — test anchor_id generation for all block types

Files the agent should NOT touch:

  • src/pal_e_docs/routes/ — API routes should not need changes (they already use anchor_id)
  • MCP tools — that's a separate repo (pal-e-docs-mcp)

Acceptance Criteria

  • When a note is saved with paragraphs, lists, tables, and code blocks, every block gets a non-null anchor_id
  • Heading blocks still get their current slugified anchor_ids (no regression)
  • Non-heading block anchor_ids are unique within a note
  • After backfill, zero blocks in the database have anchor_id = null
  • update_block(slug, anchor_id) works on paragraph and list blocks
  • delete_block(slug, anchor_id) works on paragraph and list blocks

Test Expectations

  • Unit test: parser generates anchor_ids for every block type (heading, paragraph, list, table, code, raw)
  • Unit test: anchor_ids are unique within a single note
  • Unit test: heading anchor_ids match current slugification behavior (no regression)
  • Integration test: create a note with mixed block types, verify all blocks addressable via API
  • Run command: pytest tests/ -v

Constraints

  • Anchor ID format for non-heading blocks: use {block_type}-{position} (e.g. paragraph-3, list-7, table-12). Position-based is simple and deterministic. If the agent finds a better pattern during implementation, document the rationale.
  • Anchor IDs must be stable across re-parses of the same content (idempotent)
  • The backfill should be safe to run multiple times (idempotent)
  • Keep the DB constraint: anchor_id should remain unique per note (with a unique constraint on (note_id, anchor_id) if not already present)

Checklist

  • PR opened
  • Tests pass
  • No unrelated changes
  • todo-block-anchor-ids — tracking TODO in pal-e-docs
  • phase-postgres-7-block-content — original block implementation
  • convention-block-first-access — convention that depends on block-level editing
### Lineage `todo-block-anchor-ids` (no plan ancestry — Epilogue item 9 on `phase-postgres-epilogue-cleanup`) ### Repo `forgejo_admin/pal-e-docs` ### User Story As an agent using the block API I want every block to have an anchor_id So that I can use `update_block` and `delete_block` on paragraphs, lists, tables, and code blocks — not just headings ### Context The HTML parser currently only generates `anchor_id` for heading blocks (by slugifying the heading text). All other block types get `anchor_id: null`. Since the block API endpoints (`/notes/{slug}/blocks/{anchor_id}`) use `anchor_id` as the path parameter, non-heading blocks are completely un-addressable for updates and deletes. This was discovered in a live session when trying to update a single paragraph in a plan note — the only option was `update_note()` with the entire HTML content, defeating the purpose of block-level editing. ~262 notes have blocks with null anchor_ids. After the parser fix, existing blocks need a backfill migration to populate their anchor_ids. ### File Targets Files the agent should modify or create: - `src/pal_e_docs/services/block_parser.py` (or wherever the parser lives) — generate anchor_ids for non-heading blocks - Migration script or management command — backfill existing null anchor_ids - `tests/` — test anchor_id generation for all block types Files the agent should NOT touch: - `src/pal_e_docs/routes/` — API routes should not need changes (they already use anchor_id) - MCP tools — that's a separate repo (pal-e-docs-mcp) ### Acceptance Criteria - [ ] When a note is saved with paragraphs, lists, tables, and code blocks, every block gets a non-null anchor_id - [ ] Heading blocks still get their current slugified anchor_ids (no regression) - [ ] Non-heading block anchor_ids are unique within a note - [ ] After backfill, zero blocks in the database have `anchor_id = null` - [ ] `update_block(slug, anchor_id)` works on paragraph and list blocks - [ ] `delete_block(slug, anchor_id)` works on paragraph and list blocks ### Test Expectations - [ ] Unit test: parser generates anchor_ids for every block type (heading, paragraph, list, table, code, raw) - [ ] Unit test: anchor_ids are unique within a single note - [ ] Unit test: heading anchor_ids match current slugification behavior (no regression) - [ ] Integration test: create a note with mixed block types, verify all blocks addressable via API - Run command: `pytest tests/ -v` ### Constraints - Anchor ID format for non-heading blocks: use `{block_type}-{position}` (e.g. `paragraph-3`, `list-7`, `table-12`). Position-based is simple and deterministic. If the agent finds a better pattern during implementation, document the rationale. - Anchor IDs must be stable across re-parses of the same content (idempotent) - The backfill should be safe to run multiple times (idempotent) - Keep the DB constraint: `anchor_id` should remain unique per note (with a unique constraint on `(note_id, anchor_id)` if not already present) ### Checklist - [ ] PR opened - [ ] Tests pass - [ ] No unrelated changes ### Related - `todo-block-anchor-ids` — tracking TODO in pal-e-docs - `phase-postgres-7-block-content` — original block implementation - `convention-block-first-access` — convention that depends on block-level editing
Author
Owner

PR #120 Review

BLOCKERS

1. create_block API endpoint does not generate anchor_ids for non-heading blocks

The parser change ensures all blocks parsed from HTML get non-null anchor_ids. However, the create_block endpoint in src/pal_e_docs/routes/blocks.py (lines 222-238) only generates anchor_ids automatically for heading blocks:

anchor_id = body.anchor_id
if body.block_type == "heading" and not anchor_id:
    ...

If a caller creates a paragraph, list, table, code, or mermaid block via POST without providing an anchor_id, it will be inserted with anchor_id=None. This undermines the goal of making all blocks addressable and creates an inconsistency between parser-created blocks (always have anchors) and API-created blocks (may not).

The fix should generate {block_type}-{position} anchors for non-heading blocks in create_block, using _ensure_unique_anchor to avoid collisions.

NITS

1. Anchor-position drift after reindexing (documentation, not a bug)

When _reindex_positions runs after a block deletion, or when create_block shifts positions, the position portion of anchor_ids like paragraph-3 will no longer match the block's actual position. This is fine -- anchors are stable identifiers, not position labels -- but the docstring in parse_html_to_blocks says anchors use {block_type}-{position} format, which could mislead someone into thinking they reflect current position. Consider a brief note in the docstring: "position reflects the position at parse time; it does not change when blocks are reordered."

2. _make_block signature still accepts None for anchor_id

_make_block has the signature anchor_id: str | None. Since the parser now guarantees non-null anchors, the individual _parse_* functions still return None for anchor_id (which is fine since the caller fills it in), but from a type-safety perspective the intermediate state is nullable. This is purely cosmetic -- the code is correct -- but if you ever want to tighten the contract, the fallback fill-in could move into _make_block itself.

3. Migration revision ID is non-standard

The revision ID k1f2g3h4i5j6 follows the project's existing pattern (sequential hex-like IDs), so this is consistent. No action needed, just noting it is a manually chosen ID rather than auto-generated.

4. test_backfill.py line 316 still has anchor_id: None

In test_build_toc (line 316), the test fixture still creates a paragraph block dict with "anchor_id": None. This is a test helper and does not affect production, but it is now inconsistent with the parser's output contract. Consider updating to "anchor_id": "paragraph-1" for clarity.

SOP COMPLIANCE

  • Branch named after issue: 119-block-anchor-ids references issue #119
  • PR body has: Summary, Changes, Test Plan, Related sections -- all present
  • Related section references traceability: todo-block-anchor-ids
  • Tests exist: 12 new tests in TestAnchorIdGeneration, plus 6 updated assertions in existing tests
  • 564 tests pass per PR body
  • No secrets, .env files, or credentials committed
  • No unnecessary file changes (5 files, all tightly scoped)
  • Commit messages are descriptive (PR title matches issue)

VERDICT: NOT APPROVED

One blocker: the create_block API endpoint needs to generate position-based anchor_ids for non-heading blocks, otherwise API-created blocks will have null anchors while parser-created blocks will not. This inconsistency could cause unique constraint violations (if two null-anchor blocks exist for the same note, it works in PostgreSQL, but the block becomes unaddressable) and defeats the purpose of the change.

## PR #120 Review ### BLOCKERS **1. `create_block` API endpoint does not generate anchor_ids for non-heading blocks** The parser change ensures all blocks parsed from HTML get non-null anchor_ids. However, the `create_block` endpoint in `src/pal_e_docs/routes/blocks.py` (lines 222-238) only generates anchor_ids automatically for heading blocks: ```python anchor_id = body.anchor_id if body.block_type == "heading" and not anchor_id: ... ``` If a caller creates a paragraph, list, table, code, or mermaid block via POST without providing an `anchor_id`, it will be inserted with `anchor_id=None`. This undermines the goal of making all blocks addressable and creates an inconsistency between parser-created blocks (always have anchors) and API-created blocks (may not). The fix should generate `{block_type}-{position}` anchors for non-heading blocks in `create_block`, using `_ensure_unique_anchor` to avoid collisions. ### NITS **1. Anchor-position drift after reindexing (documentation, not a bug)** When `_reindex_positions` runs after a block deletion, or when `create_block` shifts positions, the position portion of anchor_ids like `paragraph-3` will no longer match the block's actual position. This is fine -- anchors are stable identifiers, not position labels -- but the docstring in `parse_html_to_blocks` says anchors use `{block_type}-{position}` format, which could mislead someone into thinking they reflect current position. Consider a brief note in the docstring: "position reflects the position at parse time; it does not change when blocks are reordered." **2. `_make_block` signature still accepts `None` for `anchor_id`** `_make_block` has the signature `anchor_id: str | None`. Since the parser now guarantees non-null anchors, the individual `_parse_*` functions still return `None` for anchor_id (which is fine since the caller fills it in), but from a type-safety perspective the intermediate state is nullable. This is purely cosmetic -- the code is correct -- but if you ever want to tighten the contract, the fallback fill-in could move into `_make_block` itself. **3. Migration revision ID is non-standard** The revision ID `k1f2g3h4i5j6` follows the project's existing pattern (sequential hex-like IDs), so this is consistent. No action needed, just noting it is a manually chosen ID rather than auto-generated. **4. `test_backfill.py` line 316 still has `anchor_id: None`** In `test_build_toc` (line 316), the test fixture still creates a paragraph block dict with `"anchor_id": None`. This is a test helper and does not affect production, but it is now inconsistent with the parser's output contract. Consider updating to `"anchor_id": "paragraph-1"` for clarity. ### SOP COMPLIANCE - [x] Branch named after issue: `119-block-anchor-ids` references issue #119 - [x] PR body has: Summary, Changes, Test Plan, Related sections -- all present - [x] Related section references traceability: `todo-block-anchor-ids` - [x] Tests exist: 12 new tests in `TestAnchorIdGeneration`, plus 6 updated assertions in existing tests - [x] 564 tests pass per PR body - [x] No secrets, .env files, or credentials committed - [x] No unnecessary file changes (5 files, all tightly scoped) - [x] Commit messages are descriptive (PR title matches issue) ### VERDICT: NOT APPROVED One blocker: the `create_block` API endpoint needs to generate position-based anchor_ids for non-heading blocks, otherwise API-created blocks will have null anchors while parser-created blocks will not. This inconsistency could cause unique constraint violations (if two null-anchor blocks exist for the same note, it works in PostgreSQL, but the block becomes unaddressable) and defeats the purpose of the change.
Author
Owner

PR #120 Review (Re-review)

Previous review found one blocker: the create_block API endpoint did not auto-generate anchor_id for non-heading blocks. This re-review verifies the fix and checks for remaining issues.

BLOCKERS

None. The previous blocker is resolved.

NITS

  1. Test seed data still uses anchor_id=None -- The test helpers _seed_note_with_blocks() and _seed_hierarchical_note() in tests/test_blocks_api.py (lines 46, 60, 113) and tests/test_compiled_page_api.py (lines 46, 60) still create Block objects with anchor_id=None for paragraph blocks. This works because (a) SQLite unique constraints treat NULLs as distinct, and (b) the anchor_id column remains nullable=True in the model. However, these seed helpers now contradict the invariant this PR establishes -- that all blocks should have non-null anchor_ids. Similarly tests/test_blocks_compiled_pages.py (lines 50-52) creates blocks without any anchor_id at all. Consider updating these in a follow-up to use the {block_type}-{position} pattern for consistency. Non-blocking because tests pass and the production write paths are all covered.

  2. anchor_id column remains nullable=True in the model -- models.py line 212: anchor_id: Mapped[str | None] = mapped_column(String(200), nullable=True). After the migration backfills all existing NULLs and both write paths (parser + create_block API) now guarantee non-null values, consider a follow-up migration to make this column NOT NULL. The unique constraint enforces uniqueness per note but does not prevent NULLs. Non-blocking -- the code paths are correct, this is a schema hardening item.

  3. _ensure_unique_anchor scans up to suffix -19 only -- routes/blocks.py line 58 pre-generates candidates [anchor] + [f"{anchor}-{i}" for i in range(2, 20)]. If a note ever has 19+ blocks with the same {type}-{position} candidate (unlikely but theoretically possible after many insert-at-same-position operations), the loop at line 68 would exhaust the pre-fetched set. This is a pre-existing concern, not introduced by this PR.

SOP COMPLIANCE

  • Branch named after issue: 119-block-anchor-ids references issue #119
  • PR body follows template: Summary, Changes, Test Plan, Related sections all present
  • Related references traceability: links to issue #119 and todo-block-anchor-ids
  • No secrets committed: no .env files, credentials, or sensitive data in diff
  • No scope creep: all 7 changed files are directly related to the anchor_id generation feature
  • Migration is idempotent: UPDATE ... WHERE anchor_id IS NULL is safe for re-runs
  • Down_revision chain correct: k1f2g3h4i5j6 -> j0e1f2g3h4i5 (blocks and compiled_pages migration)

Write Path Audit

All three block creation paths now guarantee non-null anchor_id:

  1. Parser (src/pal_e_docs/blocks/parser.py lines 54-55, 64-67): parse_html_to_blocks() generates {block_type}-{position} for all non-heading blocks, including bare text nodes and fallback elements. Headings use slugified text. No path returns anchor_id=None.

  2. parse_and_store_blocks() (src/pal_e_docs/blocks/sync.py line 104): Uses bd.get("anchor_id") from parser output. Since the parser now always populates anchor_id, this path is covered. Used by note create and update routes.

  3. create_block API (src/pal_e_docs/routes/blocks.py lines 222-235): The previous blocker. Now correctly generates {block_type}-{position} for non-heading blocks (line 234) and runs it through _ensure_unique_anchor() for deduplication.

  4. Migration backfill (alembic/versions/k1f2g3h4i5j6): Covers all existing rows with anchor_id IS NULL using the same {block_type}-{position} pattern.

Test Coverage

  • 12 new parser tests in TestAnchorIdGeneration covering all 6 block types, bare text, fallback elements, mixed documents, uniqueness, and idempotency
  • 2 new API tests: test_create_non_heading_auto_generates_anchor and test_create_non_heading_anchor_uniqueness (with dedup suffix verification)
  • Existing backfill test assertions updated from anchor_id is None to expected values
  • Existing parser test assertions updated consistently
  • 564 tests reported passing

VERDICT: APPROVED

The blocker from the previous review is resolved. The create_block route now auto-generates anchor_id for all block types, matching the parser behavior. The migration safely backfills existing data. Test coverage is thorough. The two nits (test seed data consistency and column nullability hardening) are follow-up items, not blockers.

## PR #120 Review (Re-review) Previous review found one blocker: the `create_block` API endpoint did not auto-generate `anchor_id` for non-heading blocks. This re-review verifies the fix and checks for remaining issues. ### BLOCKERS None. The previous blocker is resolved. ### NITS 1. **Test seed data still uses `anchor_id=None`** -- The test helpers `_seed_note_with_blocks()` and `_seed_hierarchical_note()` in `tests/test_blocks_api.py` (lines 46, 60, 113) and `tests/test_compiled_page_api.py` (lines 46, 60) still create Block objects with `anchor_id=None` for paragraph blocks. This works because (a) SQLite unique constraints treat NULLs as distinct, and (b) the `anchor_id` column remains `nullable=True` in the model. However, these seed helpers now contradict the invariant this PR establishes -- that all blocks should have non-null anchor_ids. Similarly `tests/test_blocks_compiled_pages.py` (lines 50-52) creates blocks without any `anchor_id` at all. Consider updating these in a follow-up to use the `{block_type}-{position}` pattern for consistency. Non-blocking because tests pass and the production write paths are all covered. 2. **`anchor_id` column remains `nullable=True` in the model** -- `models.py` line 212: `anchor_id: Mapped[str | None] = mapped_column(String(200), nullable=True)`. After the migration backfills all existing NULLs and both write paths (parser + create_block API) now guarantee non-null values, consider a follow-up migration to make this column `NOT NULL`. The unique constraint enforces uniqueness per note but does not prevent NULLs. Non-blocking -- the code paths are correct, this is a schema hardening item. 3. **`_ensure_unique_anchor` scans up to suffix `-19` only** -- `routes/blocks.py` line 58 pre-generates candidates `[anchor] + [f"{anchor}-{i}" for i in range(2, 20)]`. If a note ever has 19+ blocks with the same `{type}-{position}` candidate (unlikely but theoretically possible after many insert-at-same-position operations), the loop at line 68 would exhaust the pre-fetched set. This is a pre-existing concern, not introduced by this PR. ### SOP COMPLIANCE - [x] Branch named after issue: `119-block-anchor-ids` references issue #119 - [x] PR body follows template: Summary, Changes, Test Plan, Related sections all present - [x] Related references traceability: links to issue #119 and `todo-block-anchor-ids` - [x] No secrets committed: no `.env` files, credentials, or sensitive data in diff - [x] No scope creep: all 7 changed files are directly related to the anchor_id generation feature - [x] Migration is idempotent: `UPDATE ... WHERE anchor_id IS NULL` is safe for re-runs - [x] Down_revision chain correct: `k1f2g3h4i5j6` -> `j0e1f2g3h4i5` (blocks and compiled_pages migration) ### Write Path Audit All three block creation paths now guarantee non-null `anchor_id`: 1. **Parser** (`src/pal_e_docs/blocks/parser.py` lines 54-55, 64-67): `parse_html_to_blocks()` generates `{block_type}-{position}` for all non-heading blocks, including bare text nodes and fallback elements. Headings use slugified text. No path returns `anchor_id=None`. 2. **`parse_and_store_blocks()`** (`src/pal_e_docs/blocks/sync.py` line 104): Uses `bd.get("anchor_id")` from parser output. Since the parser now always populates `anchor_id`, this path is covered. Used by note create and update routes. 3. **`create_block` API** (`src/pal_e_docs/routes/blocks.py` lines 222-235): The previous blocker. Now correctly generates `{block_type}-{position}` for non-heading blocks (line 234) and runs it through `_ensure_unique_anchor()` for deduplication. 4. **Migration backfill** (`alembic/versions/k1f2g3h4i5j6`): Covers all existing rows with `anchor_id IS NULL` using the same `{block_type}-{position}` pattern. ### Test Coverage - 12 new parser tests in `TestAnchorIdGeneration` covering all 6 block types, bare text, fallback elements, mixed documents, uniqueness, and idempotency - 2 new API tests: `test_create_non_heading_auto_generates_anchor` and `test_create_non_heading_anchor_uniqueness` (with dedup suffix verification) - Existing backfill test assertions updated from `anchor_id is None` to expected values - Existing parser test assertions updated consistently - 564 tests reported passing ### VERDICT: APPROVED The blocker from the previous review is resolved. The `create_block` route now auto-generates `anchor_id` for all block types, matching the parser behavior. The migration safely backfills existing data. Test coverage is thorough. The two nits (test seed data consistency and column nullability hardening) are follow-up items, not blockers.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
forgejo_admin/pal-e-api#119
No description provided.