Add full-text search: tsvector column, GIN index, trigger, search endpoint #83
Labels
No labels
domain:backend
domain:devops
domain:frontend
status:approved
status:in-progress
status:needs-fix
status:qa
type:bug
type:devops
type:feature
No milestone
No project
No assignees
1 participant
Due date
No due date set.
Dependencies
No dependencies set.
Reference
ldraney/pal-e-api#83
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Plan
plan-2026-02-26-tf-modularize-postgres-- Phase 5Repo
forgejo_admin/pal-e-docsUser Story
As an AI agent querying platform knowledge
I want to search notes by text content with ranked results and snippets
So that I can find relevant knowledge in one call instead of enumerating all notes
Context
There is currently no search anywhere — not in the API, not in MCP tools. AI agents call
list_notes(tags=...)thenget_note()on each result — 12+ API calls and thousands of tokens per knowledge lookup. This is unsustainable as the knowledge base grows.Postgres is live (Phase 3 completed). We can now use
tsvectorfor full-text search with weighted ranking and snippet extraction.Design decisions (from planning):
regexp_replace(html_content, '<[^>]+>', ' ', 'g')) — simpler than Python-side extraction, sufficient for nh3-sanitized HTMLFile Targets
Files to modify or create:
alembic/versions/XXXX_add_search_vector.py— new migration: addsearch_vectorcolumn (tsvector), GIN index, trigger function, backfill existing rowsapp/models.py— addsearch_vectorcolumn to Note model (server-managed, no app writes)app/routes/notes.py— addGET /notes/searchendpointapp/schemas.py— addNoteSearchResultschema (slug, title, note_type, status, project, headline, rank)Files NOT to touch:
app/autolink.py,app/wrap_tables.py— rendering transforms, unrelatedAcceptance Criteria
GET /notes/search?q=secrets+managementreturns relevant notes ranked by ts_rankheadlinefield with match context (ts_headline, ~50 words)rankscorehtml_content(summaries only)note_type,project,status,tags(all optional)limitparameter works (default 10)Test Expectations
pytest tests/Constraints
routes/notes.pyschemas.pyop.execute()for raw SQL (trigger, function, backfill)to_tsquery('english', ...)— handle multi-word queries by joining with&(AND)search_vectorcolumn should be nullable and server-managed — don't set it in app codeDraft SQL (starting point)
Trigger function:
Search query:
Checklist
Related
pal-e-docsprojectphase-postgres-5-fulltext-searchforgejo_admin/pal-e-docs-mcpsearch_notes tool (separate issue)PR #84 Review
BLOCKERS
None. The code is well-structured, the migration is correct, and the security-sensitive tsquery sanitization is properly handled.
NITS
Empty result on all-special-char queries (
/home/ldraney/pal-e-docs/src/pal_e_docs/routes/notes.py, line 195-196): If someone searchesq=!!!+???, themin_length=1validation passes, but after stripping non-word characters thewordslist is empty and the endpoint returns[]with 200. Consider returning 422 for consistency with the validation intent, or document this as intentional behavior.\wincludes underscores (line 193): The regexr"[^\w]"preserves underscores and digits, soq=___produces a tsquery for___. Not harmful, but worth noting -- Postgres will try to match the literal token___which will never match anything meaningful.HTML stripping regex is approximate (migration line 36 and route line 239):
regexp_replace(html_content, '<[^>]+>', ' ', 'g')won't correctly strip tags containing>inside attributes (e.g.,<div data-x="a>b">). This is fine for search indexing quality -- it is not a security boundary -- but could produce minor noise in headline snippets.Test coverage is thin: Only 3 tests, all exercising the SQLite 501 guard and FastAPI validation. No Postgres integration tests exist for: actual ranked search results, filter parameters (
note_type,project,status,tags), headline snippet generation, the trigger function, or sanitization edge cases. The PR body correctly marks the Postgres test items as unchecked. Consider whether a Postgres test fixture (or a follow-up issue for integration tests) is warranted.Hardcoded
'english'text search configuration: The trigger, backfill, and query all useto_tsvector('english', ...). If the knowledge base ever needs multilingual support, this will need to be parameterized. Fine for now, but worth a comment in the migration.SOP COMPLIANCE
83-add-fulltext-search-tsvectorreferences issue #83)plan-2026-02-26-tf-modularize-postgres)NOTES
/search(line 175) is registered before/{slug}(line 252), preventing slug-match conflicts.db.bind.dialect.namecheck correctly gates Postgres-only functionality. Verified the test conftest binds to SQLite.h8c9d0e1f2g3->i9d0e1f2g3h4.IF EXISTSon all drops).&) is a solid defense against tsquery operator injection. All filter parameters use SQLAlchemy:parambindings -- no SQL injection vectors.search_vectorin the ORM model is sound -- keeps SQLite compatibility for the test suite.VERDICT: APPROVED
Closed by PR #84 (squash merged). tsvector column, GIN index, trigger, and
GET /notes/searchendpoint deployed.Next: deploy to cluster, then spawn dev agent for pal-e-docs-mcp #18 (search_notes MCP tool).