feat: Ollama hostPath volume + embedding alerting #90
No reviewers
Labels
No labels
domain:backend
domain:devops
domain:frontend
status:approved
status:in-progress
status:needs-fix
status:qa
type:bug
type:devops
type:feature
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
forgejo_admin/pal-e-platform!90
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "89-ollama-hostpath-embedding-alerting"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Summary
Replaces Ollama PVC with hostPath volume so models survive any k8s lifecycle event (PVC recreation, namespace deletion). Adds Prometheus scraping and alert rules for the embedding worker to detect failures within 10 minutes.
Changes
terraform/main.tf— Ollama helm release: disabled PVC, added hostPath volume (/var/lib/ollamaon host ->/root/.ollamain container) via chart'svolumes/volumeMountsvaluesterraform/main.tf— Newkubernetes_service_v1.embedding_worker_metrics: ClusterIP Service inpal-e-docsnamespace exposing port 8001 for the embedding worker podterraform/main.tf— Newkubernetes_manifest.embedding_worker_service_monitor: ServiceMonitor inmonitoringnamespace that scrapespal-e-docsnamespace embedding worker every 30sterraform/main.tf— Newkubernetes_manifest.embedding_alerts: PrometheusRule with two alert rules:EmbeddingErrorRate(warning):rate(embedding_errors_total[5m]) > 0for 5mEmbeddingPipelineDown(critical): zeroembedding_totalincrease + errors increasing for 10mtofu plan -lock=false Output
Test Plan
tofu fmt-- passes (no diff)tofu validate-- passestofu plan -lock=false-- shows 3 to add, 2 to change, 0 to destroykubectl delete pod -n ollama <pod>-> pod restarts ->ollama listshowsqwen3-embedding:4bkubectl port-forward -n pal-e-docs svc/embedding-worker-metrics 8001:8001->curl localhost:8001/metricsreturns Prometheus metricsReview Checklist
Post-Deploy Steps (manual)
UPDATE blocks SET embedding_status = 'pending' WHERE embedding_status = 'error'(152 blocks)Discovered Scope
pal-e-docsNetworkPolicy (podSelector: app=pal-e-docs) only covers the main API pod. The embedding worker (app=pal-e-docs-embedding-worker) is NOT covered by the policy, meaning it allows all ingress by default. This works for now (Prometheus can reach it), but a proper policy should be added inpal-e-deploymentsto explicitly scope embedding worker ingress.Related
plan-pal-e-docs-- Phase F12Tofu Plan Output
Self-Review
Acceptance criteria vs implementation:
/var/lib/ollama), PVC removedtofu plan -lock=falseshows expected changes:8001rate(embedding_errors_total[5m]) > 0warningtofu fmtpassestofu validatepassesDesign decisions:
hostPath via helm chart values -- Used the chart's
volumes/volumeMountsvalues rather than raw k8s resources. This keeps the Ollama deployment as a single helm release.DirectoryOrCreatetype ensures the directory is created on first deploy.Service in pal-e-docs namespace -- The embedding worker has no existing k8s Service (only the main API pod does). Created
embedding-worker-metricsService to enable ServiceMonitor discovery. The Service lives inpal-e-docsnamespace alongside the worker pod.ServiceMonitor in monitoring namespace -- Follows the existing pattern (see
dora_exporter_service_monitor). UsesnamespaceSelector.matchNamesto reach across topal-e-docsnamespace.PrometheusRule labels -- Includes
app.kubernetes.io/part-of: kube-prometheus-stackandrelease: kube-prometheus-stackto match the existing pattern (seeblackbox_alerts).No issues found. Ready for human review.
PR #90 Review
DOMAIN REVIEW
Tech stack: Terraform (OpenTofu) / Kubernetes / Helm / Prometheus Operator CRDs
Ollama hostPath migration:
The PVC-to-hostPath swap is well-reasoned. The safety argument is sound:
gpu.enabled = true+gpu.number = 1in the Ollama chart values translates to anvidia.com/gpu: 1resource request, which guarantees the pod always schedules on the GPU node.DirectoryOrCreateis the correcthostPath.type-- it avoids requiring manual pre-provisioning. The host path/var/lib/ollamafollows FHS conventions. The comment block at lines 1596-1602 clearly documents the rationale and the safety invariant.One consideration:
hostPathvolumes are writable by the container as root, which is inherent to this pattern and acceptable here since Ollama needs write access to store models.Embedding Worker Metrics Service (lines 1662-1682):
Clean. Uses
data.kubernetes_namespace_v1.pal_e_docsfor the namespace reference (consistent with being a data source -- the namespace is managed by pal-e-services, not this repo). Selector labelapp = "pal-e-docs-embedding-worker"must match the actual pod labels in the pal-e-docs deployment. Port 8001 for metrics is a reasonable non-conflicting port.ServiceMonitor (lines 1684-1719):
Correctly placed in the
monitoringnamespace with anamespaceSelector.matchNamespointing topal-e-docs. This is the right pattern for cross-namespace scraping (contrasted with the DORA exporter ServiceMonitor at line 1240, which omitsnamespaceSelectorbecause both resources are inmonitoring).depends_on = [helm_release.kube_prometheus_stack]ensures the CRD exists before the manifest is applied. 30-second scrape interval is appropriate for an operational metric.PrometheusRule (lines 1726-1774):
Labels (
app.kubernetes.io/part-of = "kube-prometheus-stack"andrelease = "kube-prometheus-stack") match the existingblackbox_alertsresource at line 488-490 -- consistent pattern for Prometheus rule discovery.Alert logic review:
EmbeddingErrorRate:rate(embedding_errors_total[5m]) > 0for 5m. Fires on any sustained error rate. This is correct for a warning-level alert -- even a low error rate in an embedding pipeline should be investigated.EmbeddingPipelineDown:increase(embedding_total[10m]) == 0 and increase(embedding_errors_total[10m]) > 0for 10m. This correctly identifies a stuck pipeline (errors happening but no successes). Thefor: 10mon top of the 10-minute window means the condition must persist for ~20 minutes total before firing, which avoids false positives during transient hiccups. Good severity escalation (critical vs warning).Terraform style:
depends_on, namespace references, comment blocks)tofu fmtissues (confirmed in PR body)tofu validatepasses (confirmed in PR body)tofu planoutput is included per repo conventionBLOCKERS
None.
NITS
EmbeddingPipelineDown floating-point comparison:
increase(embedding_total[10m]) == 0uses exact float equality. In Prometheus,increase()can return very small non-zero values due to floating-point math on counter resets. Considerincrease(embedding_total[10m]) < 1instead. This is unlikely to cause issues in practice (the counter increments by whole numbers and resets are rare), but it is the more defensive pattern.NetworkPolicy gap (acknowledged): The Discovered Scope section correctly notes that the embedding worker pod lacks a NetworkPolicy. This is properly deferred -- just confirming it is tracked.
Woodpecker
set_sensitivereorder: Thetofu planoutput notes a no-op change tohelm_release.woodpeckerfromset_sensitiveblock reordering. This is harmless (Terraform internal ordering) but worth confirming it produces no actual diff on apply.SOP COMPLIANCE
89-ollama-hostpath-embedding-alertingreferences issue #89plan-pal-e-docs -- Phase F12tofu plan -lock=falseoutput included (per CLAUDE.md convention)tofu fmtandtofu validateconfirmed passingPROCESS OBSERVATIONS
tofu planshows 3 add / 2 change / 0 destroy -- no destructive operations.VERDICT: APPROVED