F12: Ollama hostPath volume + embedding alerting #89
Labels
No labels
domain:backend
domain:devops
domain:frontend
status:approved
status:in-progress
status:needs-fix
status:qa
type:bug
type:devops
type:feature
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
forgejo_admin/pal-e-platform#89
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Lineage
plan-pal-e-docs→ Phase F12 (phase-pal-e-docs-f12-semantic-search-recovery)Repo
forgejo_admin/pal-e-platformUser Story
As a platform operator
I want Ollama models to persist across pod restarts and embedding failures to trigger alerts
So that semantic search stays healthy without manual intervention
Context
Semantic search was down for 6+ days. Root cause: Ollama PVC was recreated and
qwen3-embedding:4bmodel was never re-pulled. Immediate fix applied (model pulled, chat models removed to free 6Gi cgroup). This issue is the durable fix: hostPath volume so models survive any k8s lifecycle event, plus Prometheus alerting so failures are detected within 10 minutes.Key findings from diagnosis:
ollama/ollama:0.17.6, GPU: GTX 1070 (8GB VRAM)embedding_queue_depthmetric is NOT sufficient for alerting — failed blocks get markederrorand leave the queue (reads 0 during failures). Must alert onembedding_errors_totalrate.errorstatus need backfill.File Targets
Files to modify:
terraform/modules/ollama/main.tf(or wherever Ollama deployment is defined) — swap PVC for hostPath volume mount (/var/lib/ollama)terraform/modules/prometheus/alerts.tf(or alert rules config) — add embedding error rate alertsterraform/modules/prometheus/scrape.tf(or scrape config) — add scrape target for embedding worker:8001/metricsFiles NOT to touch:
~/pal-e-docs/— backfill is a manual step, not a code changeAcceptance Criteria
/var/lib/ollama), PVC removedtofu plan -lock=falseshows clean (PVC removed, hostPath added)qwen3-embedding:4bsurvives pod restart (delete pod, verify model still loaded after restart):8001rate(embedding_errors_total[5m]) > 0→ warningembedding_total == 0for > 10min while errors increasing → criticalUPDATE blocks SET embedding_status = 'pending' WHERE embedding_status = 'error'), verify worker processes themTest Expectations
tofu validatepassestofu plan -lock=falseshows expected changes (PVC → hostPath, new alert rules, new scrape target)kubectl delete pod -n ollama <pod>→ pod restarts →ollama liststill showsqwen3-embedding:4bcurl localhost:8001/metrics(port-forwarded) returns Prometheus metricsConstraints
tofu planMUST include-lock=false(state lock blocks CI)tofu fmtandtofu validatemust passnvidia.com/gpu: 1— it always lands on the GPU node, so hostPath is safeqwen3-embedding:4bmodel. No chat models — they cause OOM with 6Gi limit.Checklist
tofu plan -lock=falseoutput in PR descriptionRelated
phase-pal-e-docs-f12-semantic-search-recovery— phase note with full diagnostic dataplan-pal-e-platform— Platform Hardening (alerting infrastructure from Phase 16)phase-pal-e-docs-f13-context-intelligence(F13b-2 needs durable vectors)