Activate semantic search pipeline — scale embedding worker, verify hybrid search #154

Closed
opened 2026-03-14 14:22:13 +00:00 by forgejo_admin · 0 comments

Lineage

plan-pal-e-docs → Phase: Activate Semantic Search Pipeline (phase-pal-e-docs-activate-semantic-search)

Repo

forgejo_admin/pal-e-docs

User Story

As a platform agent
I want semantic search across all 260+ knowledge base notes
So that I can find relevant SOPs, conventions, and context by meaning rather than exact keywords

Context

The full semantic search stack was built during Act 2 but intentionally left dormant:

  • pgvector extension installed, blocks table has embedding vector(768) column
  • embedding_worker.py (610 lines) — async worker with LISTEN/NOTIFY, batch processing, Prometheus metrics, health endpoint, backfill mode
  • services/search.py — Reciprocal Rank Fusion (RRF) combining tsvector keyword + pgvector semantic
  • API endpoints: /search?mode=keyword|semantic|hybrid and /semantic-search
  • SDK + MCP tools: search_notes() and semantic_search() fully wired
  • k8s manifest exists at k8s/embedding-worker.yaml with replicas: 0
  • Ollama running in-cluster with qwen3-embedding:4b (768-dim, 3.5GB VRAM)
  • DB state: 5,643 blocks pending, 16 skipped, 0 completed

File Targets

Files the agent should modify:

  • k8s/embedding-worker.yaml — change replicas: 0 to replicas: 1
  • k8s/embedding-worker.yaml — verify image tag is current (check ArgoCD Image Updater annotation or compare to deployed API image)

Files the agent should NOT touch:

  • src/pal_e_docs/embedding_worker.py — already production-ready
  • src/pal_e_docs/services/search.py — already complete
  • src/pal_e_docs/routes/notes.py — search endpoints already wired

Acceptance Criteria

  • k8s/embedding-worker.yaml has replicas: 1
  • Image tag in manifest is current (matches latest build or has Image Updater annotation)
  • After deploy, embedding worker pod is Running and healthy (/healthz returns 200)
  • Embedding worker processes pending blocks (check embedding_status counts in DB)
  • /search?mode=semantic&q=deployment+recovery returns relevant results via API
  • /search?mode=hybrid&q=how+to+create+a+plan returns fused results
  • semantic_search() MCP tool returns ranked results

Test Expectations

  • Verify worker pod starts: kubectl get pods -n pal-e-docs -l app=pal-e-docs-embedding-worker
  • Verify embedding progress: kubectl exec -n pal-e-docs deploy/pal-e-docs -- python -c "from sqlalchemy import create_engine, text; import os; e=create_engine(os.environ['PALDOCS_DATABASE_URL']); c=e.connect(); print(c.execute(text(\"SELECT embedding_status, count(*) FROM blocks GROUP BY embedding_status\")).fetchall())"
  • Verify search: curl -s 'http://localhost:8000/notes/search?q=deployment&mode=semantic&limit=3' (via port-forward or in-cluster)

Constraints

  • ArgoCD reads from pal-e-docs/k8s/ — push to main triggers deploy
  • The deployment pipeline is: merge to main → Woodpecker CI → Harbor → ArgoCD Image Updater → ArgoCD sync
  • Ollama is in ollama namespace, worker connects via http://ollama.ollama.svc.cluster.local:11434
  • Do NOT modify the embedding worker code — it's been tested and is production-ready
  • The worker uses the same Docker image as the API but with entrypoint python -m pal_e_docs.embedding_worker

Checklist

  • PR opened
  • Tests pass
  • No unrelated changes
  • pal-e-docs — project this affects
### Lineage `plan-pal-e-docs` → Phase: Activate Semantic Search Pipeline (`phase-pal-e-docs-activate-semantic-search`) ### Repo `forgejo_admin/pal-e-docs` ### User Story As a platform agent I want semantic search across all 260+ knowledge base notes So that I can find relevant SOPs, conventions, and context by meaning rather than exact keywords ### Context The full semantic search stack was built during Act 2 but intentionally left dormant: - pgvector extension installed, `blocks` table has `embedding` vector(768) column - `embedding_worker.py` (610 lines) — async worker with LISTEN/NOTIFY, batch processing, Prometheus metrics, health endpoint, backfill mode - `services/search.py` — Reciprocal Rank Fusion (RRF) combining tsvector keyword + pgvector semantic - API endpoints: `/search?mode=keyword|semantic|hybrid` and `/semantic-search` - SDK + MCP tools: `search_notes()` and `semantic_search()` fully wired - k8s manifest exists at `k8s/embedding-worker.yaml` with **replicas: 0** - Ollama running in-cluster with `qwen3-embedding:4b` (768-dim, 3.5GB VRAM) - DB state: **5,643 blocks pending**, 16 skipped, 0 completed ### File Targets Files the agent should modify: - `k8s/embedding-worker.yaml` — change `replicas: 0` to `replicas: 1` - `k8s/embedding-worker.yaml` — verify image tag is current (check ArgoCD Image Updater annotation or compare to deployed API image) Files the agent should NOT touch: - `src/pal_e_docs/embedding_worker.py` — already production-ready - `src/pal_e_docs/services/search.py` — already complete - `src/pal_e_docs/routes/notes.py` — search endpoints already wired ### Acceptance Criteria - [ ] `k8s/embedding-worker.yaml` has `replicas: 1` - [ ] Image tag in manifest is current (matches latest build or has Image Updater annotation) - [ ] After deploy, embedding worker pod is Running and healthy (`/healthz` returns 200) - [ ] Embedding worker processes pending blocks (check `embedding_status` counts in DB) - [ ] `/search?mode=semantic&q=deployment+recovery` returns relevant results via API - [ ] `/search?mode=hybrid&q=how+to+create+a+plan` returns fused results - [ ] `semantic_search()` MCP tool returns ranked results ### Test Expectations - [ ] Verify worker pod starts: `kubectl get pods -n pal-e-docs -l app=pal-e-docs-embedding-worker` - [ ] Verify embedding progress: `kubectl exec -n pal-e-docs deploy/pal-e-docs -- python -c "from sqlalchemy import create_engine, text; import os; e=create_engine(os.environ['PALDOCS_DATABASE_URL']); c=e.connect(); print(c.execute(text(\"SELECT embedding_status, count(*) FROM blocks GROUP BY embedding_status\")).fetchall())"` - [ ] Verify search: `curl -s 'http://localhost:8000/notes/search?q=deployment&mode=semantic&limit=3'` (via port-forward or in-cluster) ### Constraints - ArgoCD reads from `pal-e-docs/k8s/` — push to main triggers deploy - The deployment pipeline is: merge to main → Woodpecker CI → Harbor → ArgoCD Image Updater → ArgoCD sync - Ollama is in `ollama` namespace, worker connects via `http://ollama.ollama.svc.cluster.local:11434` - Do NOT modify the embedding worker code — it's been tested and is production-ready - The worker uses the same Docker image as the API but with entrypoint `python -m pal_e_docs.embedding_worker` ### Checklist - [ ] PR opened - [ ] Tests pass - [ ] No unrelated changes ### Related - `pal-e-docs` — project this affects
forgejo_admin 2026-03-14 14:27:51 +00:00
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
forgejo_admin/pal-e-api#154
No description provided.