forgejo_admin/pal-e-api

Fork

You've already forked pal-e-api

Code Issues 8 Pull requests Projects Releases Packages Wiki Activity Actions

6c: Async embedding pipeline + backfill #129

New issue

Closed

opened 2026-03-09 03:56:52 +00:00 by forgejo_admin · 0 comments

forgejo_admin commented

2026-03-09 03:56:52 +00:00

Owner

Copy link

Lineage

plan-2026-02-26-tf-modularize-postgres → Phase 6 (Vector Search) → Phase 6c

Repo

forgejo_admin/pal-e-docs

User Story

As an AI agent on the pal-e platform
I want block content automatically embedded as vectors when notes are created or updated
So that semantic search can find relevant knowledge without brute-force enumeration

Context

Phase 6b deployed pgvector schema: embedding vector(768) and embedding_status varchar(20) columns on the blocks table, plus a Postgres trigger that fires NOTIFY embedding_queue on block INSERT/UPDATE and sets embedding_status = 'pending'. Ollama is live as a platform service (Phase 6a) at http://ollama.ollama.svc.cluster.local:11434 with qwen3-embedding:4b loaded in VRAM.

The trigger is firing but nothing is listening. This phase builds the worker that consumes those notifications, calls Ollama for embeddings, and stores the vectors. It also backfills all ~5K existing blocks.

Decisions already made (from decision-phase6-vector-search-architecture):

Async via PostgreSQL LISTEN/NOTIFY (no Redis/Celery)
Per-block embedding (not per-note)
Separate k8s Deployment (independent failure domain, no GPU request — calls Ollama over HTTP)
Instruction prefix: "Represent this platform knowledge base section for retrieval: {block_text}"
Same Docker image as API pod, different entrypoint
Embed: paragraph, list, heading (with parent context), table (flattened), code. Skip: mermaid (already skipped by trigger)

File Targets

Files to create:

src/pal_e_docs/embedding_worker.py — main worker process (LISTEN loop, poll fallback, batch processor, health endpoint, metrics, backfill mode)
k8s/embedding-worker.yaml — k8s Deployment manifest

Files to modify:

src/pal_e_docs/config.py — add ollama_url setting (PALDOCS_OLLAMA_URL)
k8s/kustomization.yaml — add embedding-worker.yaml resource
pyproject.toml — move httpx from dev to main deps

Files NOT to touch:

alembic/versions/ — no new migrations (6b already created the schema + trigger)
src/pal_e_docs/routes/ — no API changes (that's 6d)
src/pal_e_docs/models.py — model already has embedding and embedding_status columns

Acceptance Criteria

Worker starts, connects to Postgres, issues LISTEN embedding_queue
On NOTIFY: queries pending blocks, extracts text (block_type-aware), calls Ollama, stores vector, sets embedding_status = 'completed'
Poll fallback: every 60s, sweeps blocks where embedding_status = 'pending' (catches missed notifications)
State machine: pending → processing → completed | error. processing prevents duplicate work on restart
Retry with exponential backoff on Ollama transient errors
Graceful SIGTERM: finishes current batch, resets any processing blocks to pending
Health endpoint on a lightweight HTTP server (e.g., port 8001) for k8s probes
Prometheus metrics: embedding_total, embedding_errors_total, embedding_duration_seconds, embedding_queue_depth
--backfill flag: processes all pending blocks in rate-limited batches with progress logging
k8s Deployment: same image, entrypoint python -m pal_e_docs.embedding_worker, no GPU request, minimal resources (10m/64Mi req, 256Mi limit)
Backfill run completes: all ~5K embeddable blocks have embedding_status = 'completed' and non-null embedding

Test Expectations

Unit test: block text extraction — each block_type produces expected plain text
Unit test: mermaid blocks skipped, empty blocks handled gracefully
Unit test: heading text includes parent note title context
Integration test: end-to-end — create block → worker picks up → embedding stored (requires Ollama, may need to mock in CI)
Run command: pytest tests/ -k test_embedding

Constraints

Use raw psycopg2 connection for LISTEN (SQLAlchemy doesn't expose it)
Use httpx for Ollama HTTP calls (async-capable, already in dev deps)
Match structured logging style of existing codebase
Health endpoint should be minimal (not full FastAPI — a simple http.server or similar)
Batch size: 10 blocks per cycle (live), configurable via env var for backfill
The worker does NOT need runtimeClassName: nvidia — it calls Ollama over HTTP, Ollama owns the GPU

Checklist

PR opened
Tests pass
Backfill verified (all blocks embedded)
No unrelated changes

project-pal-e-docs — project this affects
Issue #126 — 6b-1 extension ownership (independent, doesn't block this)
PR #122 — 6b schema migration (predecessor)
PR #27 (pal-e-platform) — 6a Ollama deployment (predecessor)

### Lineage `plan-2026-02-26-tf-modularize-postgres` → Phase 6 (Vector Search) → Phase 6c ### Repo `forgejo_admin/pal-e-docs` ### User Story As an AI agent on the pal-e platform I want block content automatically embedded as vectors when notes are created or updated So that semantic search can find relevant knowledge without brute-force enumeration ### Context Phase 6b deployed pgvector schema: `embedding vector(768)` and `embedding_status varchar(20)` columns on the `blocks` table, plus a Postgres trigger that fires `NOTIFY embedding_queue` on block INSERT/UPDATE and sets `embedding_status = 'pending'`. Ollama is live as a platform service (Phase 6a) at `http://ollama.ollama.svc.cluster.local:11434` with `qwen3-embedding:4b` loaded in VRAM. The trigger is firing but nothing is listening. This phase builds the worker that consumes those notifications, calls Ollama for embeddings, and stores the vectors. It also backfills all ~5K existing blocks. Decisions already made (from `decision-phase6-vector-search-architecture`): - Async via PostgreSQL LISTEN/NOTIFY (no Redis/Celery) - Per-block embedding (not per-note) - Separate k8s Deployment (independent failure domain, no GPU request — calls Ollama over HTTP) - Instruction prefix: `"Represent this platform knowledge base section for retrieval: {block_text}"` - Same Docker image as API pod, different entrypoint - Embed: paragraph, list, heading (with parent context), table (flattened), code. Skip: mermaid (already `skipped` by trigger) ### File Targets Files to create: - `src/pal_e_docs/embedding_worker.py` — main worker process (LISTEN loop, poll fallback, batch processor, health endpoint, metrics, backfill mode) - `k8s/embedding-worker.yaml` — k8s Deployment manifest Files to modify: - `src/pal_e_docs/config.py` — add `ollama_url` setting (`PALDOCS_OLLAMA_URL`) - `k8s/kustomization.yaml` — add `embedding-worker.yaml` resource - `pyproject.toml` — move `httpx` from dev to main deps Files NOT to touch: - `alembic/versions/` — no new migrations (6b already created the schema + trigger) - `src/pal_e_docs/routes/` — no API changes (that's 6d) - `src/pal_e_docs/models.py` — model already has `embedding` and `embedding_status` columns ### Acceptance Criteria - [ ] Worker starts, connects to Postgres, issues `LISTEN embedding_queue` - [ ] On NOTIFY: queries pending blocks, extracts text (block_type-aware), calls Ollama, stores vector, sets `embedding_status = 'completed'` - [ ] Poll fallback: every 60s, sweeps blocks where `embedding_status = 'pending'` (catches missed notifications) - [ ] State machine: `pending → processing → completed | error`. `processing` prevents duplicate work on restart - [ ] Retry with exponential backoff on Ollama transient errors - [ ] Graceful SIGTERM: finishes current batch, resets any `processing` blocks to `pending` - [ ] Health endpoint on a lightweight HTTP server (e.g., port 8001) for k8s probes - [ ] Prometheus metrics: `embedding_total`, `embedding_errors_total`, `embedding_duration_seconds`, `embedding_queue_depth` - [ ] `--backfill` flag: processes all pending blocks in rate-limited batches with progress logging - [ ] k8s Deployment: same image, entrypoint `python -m pal_e_docs.embedding_worker`, no GPU request, minimal resources (10m/64Mi req, 256Mi limit) - [ ] Backfill run completes: all ~5K embeddable blocks have `embedding_status = 'completed'` and non-null `embedding` ### Test Expectations - [ ] Unit test: block text extraction — each block_type produces expected plain text - [ ] Unit test: mermaid blocks skipped, empty blocks handled gracefully - [ ] Unit test: heading text includes parent note title context - [ ] Integration test: end-to-end — create block → worker picks up → embedding stored (requires Ollama, may need to mock in CI) - Run command: `pytest tests/ -k test_embedding` ### Constraints - Use raw `psycopg2` connection for LISTEN (SQLAlchemy doesn't expose it) - Use `httpx` for Ollama HTTP calls (async-capable, already in dev deps) - Match structured logging style of existing codebase - Health endpoint should be minimal (not full FastAPI — a simple `http.server` or similar) - Batch size: 10 blocks per cycle (live), configurable via env var for backfill - The worker does NOT need `runtimeClassName: nvidia` — it calls Ollama over HTTP, Ollama owns the GPU ### Checklist - [ ] PR opened - [ ] Tests pass - [ ] Backfill verified (all blocks embedded) - [ ] No unrelated changes ### Related - `project-pal-e-docs` — project this affects - Issue #126 — 6b-1 extension ownership (independent, doesn't block this) - PR #122 — 6b schema migration (predecessor) - PR #27 (pal-e-platform) — 6a Ollama deployment (predecessor)

forgejo_admin referenced this issue from a commit

2026-03-09 13:06:03 +00:00

Add async embedding pipeline worker and backfill (#129)

~~forgejo_admin referenced this issue 2026-03-09 13:06:42 +00:00~~

Add async embedding pipeline worker and backfill (#129) #130

forgejo_admin referenced this issue

2026-03-09 13:10:32 +00:00

Add async embedding pipeline worker and backfill (#129) #130

forgejo_admin added the

status:approved

label

2026-03-09 13:10:33 +00:00

forgejo_admin referenced this issue from a pull request that will close it,

2026-03-09 13:23:39 +00:00

Add async embedding pipeline worker and backfill (#129) #130

forgejo_admin closed this issue

2026-03-09 13:29:23 +00:00

forgejo_admin referenced this issue from a commit

2026-03-09 13:29:25 +00:00

Add async embedding pipeline worker and backfill (#129) (#130)

No Branch/Tag specified

main

158-incident-fix-migration-crash-drop-hnsw-i

128-add-jinja2-template-rendering-endpoint-p

132-fix-template-endpoint-qa-nits-is-public

123-clean-up-test-seed-block-objects-with-nu

124-harden-anchor-id-column-to-not-null

119-generate-anchor-ids-for-all-block-types

111-7e-1a-qa-nits-cleanup-redundant-assignme

11-fix-repos-formatting

5-browse-frontend

1-scaffold-project

No results found.

Labels

Clear labels

QA passed, awaiting merge approval

status:in-progress

Dev agent is actively working

status:needs-fix

QA found issues, back to dev

status:qa

PR submitted, awaiting QA review

type:bug

Bug fix

type:devops

Infrastructure/CI/config work

No labels

Milestone

Clear milestone

No items

No milestone

Projects

Clear projects

No items

No project

Assignees

Clear assignees

No assignees

1 participant

Notifications

Due date

The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference

forgejo_admin/pal-e-api#129

Reference in a new issue

Repository

forgejo_admin/pal-e-api

Title

Body

No description provided.

Delete branch "%!s()"

Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?

Rows
Columns