Add async embedding pipeline worker and backfill (#129) #130
No reviewers
Labels
No labels
domain:backend
domain:devops
domain:frontend
status:approved
status:in-progress
status:needs-fix
status:qa
type:bug
type:devops
type:feature
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
forgejo_admin/pal-e-api!130
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "129-async-embedding-pipeline"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Summary
embedding_queuechannel fired by the 6b trigger, calls Ollama for per-block embeddings, and stores 768-dim vectors via pgvector--backfillmode for all ~5K existing blocksChanges
src/pal_e_docs/embedding_worker.py(new): Main worker process -- LISTEN loop, poll fallback every 60s, batch processor with block-type-aware text extraction, Ollama HTTP client, state machine (pending -> processing -> completed | error), health/metrics HTTP server on port 8001, --backfill CLI flagk8s/embedding-worker.yaml(new): k8s Deployment -- same Docker image, entrypointpython -m pal_e_docs.embedding_worker, no GPU request, 10m/64Mi requests, 256Mi limit, liveness/readiness probestests/test_embedding.py(new): 20 unit tests covering text extraction for all block types, mermaid skip, empty/null edge casessrc/pal_e_docs/config.py: Addollama_urlsetting (PALDOCS_OLLAMA_URL, defaults to in-cluster Ollama)k8s/kustomization.yaml: Add embedding-worker.yaml to resourcespyproject.toml: Move httpx from dev to main deps, add prometheus-clientTest Plan
pytest tests/ -k test_embedding-- 20/20 passingruff check-- cleanReview Checklist
Related
plan-2026-02-26-tf-modularize-postgres-- Phase 6 Vector Search, Phase 6cCloses #129
PR #130 Review
BLOCKERS
1. Trigger re-fire risk on
_store_embeddingUPDATE is safe -- butprocess_batchempty-content path may re-trigger.The DB trigger fires on
BEFORE INSERT OR UPDATE OF content, block_type. The worker's UPDATE statements only touchembeddingandembedding_statuscolumns, so the trigger will NOT re-fire during normal embedding storage (_store_embedding,_mark_error,_reset_processing). This is correct and safe -- no infinite loop.No blockers found. The architecture is sound.
NITS
1.
EMBEDDABLE_TYPESconstant is defined but never used (line 52).The set
{"paragraph", "list", "heading", "table", "code"}is defined at module scope but never referenced anywhere in the code. The mermaid exclusion is handled separately by the SQLWHERE block_type != 'mermaid'clause in_fetch_pending_blocksand by the DB trigger. Either use this constant in the query (replacing the hardcoded'mermaid'check) or remove it to avoid confusion about what governs embeddability.2. Redundant
set_isolation_levelcall in reconnection path (line 523)._connect()already callsconn.set_isolation_level(ISOLATION_LEVEL_AUTOCOMMIT)at line 225. The reconnection block at line 523 calls it again immediately after_connect(). Harmless but unnecessary duplication.3. N+1 query in
_fetch_pending_blocks(lines 259-264).After the batch UPDATE...RETURNING, the code loops through each row and issues a separate
SELECT n.title FROM notes n JOIN blocks b ...query per block. This could be folded into the main UPDATE query using a CTE or a single JOIN query after the batch, reducing round-trips from N+1 to 2. With BATCH_SIZE=10 this is tolerable, but with BACKFILL_BATCH_SIZE=50 it adds up.4.
httpxis now listed in bothdependenciesand[project.optional-dependencies] devinpyproject.toml(lines 25 and 32).The move from dev to main deps is correct, but the dev entry was not removed. Having the same package in both sections is harmless (pip deduplicates) but is confusing for future maintainers. Consider removing it from dev deps.
5. k8s Deployment uses a hardcoded image tag (line 24 of
embedding-worker.yaml).The image tag
bec9a4ecb11dad50dab3fe55cbb327afc711d6eeis the same SHA as the maindeployment.yaml. This is consistent, but both will need updating on the next deploy. This is the existing pattern, so not a change introduced by this PR -- just noting it as inherited tech debt.6. No
terminationGracePeriodSecondsin the k8s Deployment.The worker implements SIGTERM-based graceful shutdown (resetting processing blocks back to pending), but the default k8s
terminationGracePeriodSecondsis 30s. For a worker that might be mid-batch with retry backoff (up to ~7s per block worst case), 30s is likely sufficient for BATCH_SIZE=10 but could be tight in edge cases. Consider explicitly setting it to document the intent.7. Health server binds to
0.0.0.0(line 453).This is standard for k8s containers but worth noting. The health endpoint has no authentication, which is fine for internal cluster traffic behind k8s network policy. No action needed.
SOP COMPLIANCE
129-async-embedding-pipelinereferences issue #129plan-2026-02-26-tf-modularize-postgresPhase 6cCloses forgejo_admin/pal-e-docs #129tests/test_embedding.pycovering all block types and edge casesVERDICT: APPROVED
The implementation is solid. The LISTEN/NOTIFY architecture with poll fallback is well-designed. The state machine (pending -> processing -> completed/error) with crash recovery (reset processing on startup/shutdown) is robust. The text extraction logic is thorough and well-tested. The retry with exponential backoff for transient Ollama errors is appropriate. All nits are non-blocking quality suggestions.