6e: Add hybrid ranking to search endpoint (tsvector + pgvector) #139
Labels
No labels
domain:backend
domain:devops
domain:frontend
status:approved
status:in-progress
status:needs-fix
status:qa
type:bug
type:devops
type:feature
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
forgejo_admin/pal-e-api#139
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Lineage
plan-2026-02-26-tf-modularize-postgres→ Phase 6 (Vector Search) → Phase 6e (Hybrid Ranking)Repo
forgejo_admin/pal-e-docsUser Story
As an agent querying pal-e-docs
I want a unified search endpoint that combines keyword and semantic relevance
So that search results are ranked by both exact term matches and meaning similarity
Context
Phase 5 added full-text search (tsvector,
GET /notes/search). Phase 6d added semantic search (pgvector + Ollama embeddings,GET /notes/semantic-search). Currently agents must choose one or the other. Hybrid ranking combines both signals — a note that matches both keywords AND meaning should rank higher than one matching only one signal.The recommended approach is Reciprocal Rank Fusion (RRF). RRF is simpler than weighted linear combination (no score normalization needed) and well-proven in information retrieval. Formula:
RRF(d) = Σ 1/(k + rank_i(d))where k is typically 60.File Targets
Files to modify:
src/pal_e_docs/routes/notes.py— addmodequery parameter to search endpoint (keyword/semantic/hybrid)src/pal_e_docs/services/or equivalent — implement hybrid ranking logic (RRF combination of tsvector and pgvector results)tests/— unit + integration tests for hybrid modeFiles NOT to touch:
src/pal_e_docs/services/embedding_worker.py— embeddings are already computedAcceptance Criteria
GET /notes/search?q=hello&mode=keywordreturns same results as current behavior (backward compatible)GET /notes/search?q=hello&mode=semanticreturns semantically similar resultsGET /notes/search?q=hello&mode=hybridcombines both signals using RRFkeyword(backward compatible whenmodeis omitted)alphaparameter (0.0-1.0) controls weighting — 0.0 = pure keyword, 1.0 = pure semantic, 0.5 = balancedTest Expectations
pytest tests/ -v -k hybridConstraints
GET /notes/searchbehavior must not change whenmodeis omitted/notes/semantic-searchendpoint remains unchanged (deprecation is a separate decision)modeparameter pass-through) will be separate follow-up sub-phasesroutes/notes.pyChecklist
Related
pal-e-docs— projectphase-postgres-6e-hybrid-ranking— phase note