Deploy Ollama + NVIDIA device plugin as platform services #24
Labels
No labels
domain:backend
domain:devops
domain:frontend
status:approved
status:in-progress
status:needs-fix
status:qa
type:bug
type:devops
type:feature
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
forgejo_admin/pal-e-platform#24
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Lineage
plan-2026-02-26-tf-modularize-postgres→ Phase 6 → Phase 6a (Deploy Ollama as Platform Service)Repo
forgejo_admin/pal-e-platformUser Story
As a platform operator
I want Ollama running in-cluster with GPU access managed by Terraform
So that the embedding worker (Phase 6b) can generate vectors via
http://ollama.ollama.svc.cluster.local:11434Context
Phase 6 (Vector Search) needs Ollama running in-cluster to generate embeddings via Qwen3-Embedding-4B. This is the first sub-phase — deploying Ollama as a reusable platform service.
Key facts:
nvcr.io/nvidia/k8s-device-plugin:v0.17.0. Must be brought under Helm/Terraform management.nvidia.com/gpu: 1.affinityrules (which match NFD labels) must be overridden to{}or the DaemonSet won't schedule.Decisions made:
0.17.4(closest patch to deployed v0.17.0 image; chart repo doesn't have 0.17.0)ollama-helm/ollamaversion1.49.0(latest stable, app version 0.17.6) fromhttps://otwld.github.io/ollama-helm/ollama.gpu.enabled: true+ollama.gpu.number: 1(not raw resource limits)ollama.models.pull: ["qwen3-embedding:4b"]persistentVolume.enabled: truewithlocal-pathstorageClassdead+enable: FalseFile Targets
Files to modify:
terraform/main.tf— Add 3 resources after the CNPG section (~line 1097):helm_release.nvidia_device_plugin,kubernetes_namespace_v1.ollama,helm_release.ollamasalt/states/services/init.sls— Change ollama-service fromservice.running+enable: Truetoservice.dead+enable: False. Update comments to explain why.Files NOT to touch:
terraform/variables.tf— No new variables needed (no secrets for Ollama)terraform/providers.tf— No new providers neededAcceptance Criteria
kubectl get daemonset -n kube-system | grep nvidiashows Helm-managed device plugin runningkubectl get nodes -o json | jq '.items[0].status.capacity["nvidia.com/gpu"]'returns"1"kubectl get pods -n ollamashows Ollama pod Runningkubectl exec -n ollama deployment/ollama -- ollama listshowsqwen3-embedding:4bkubectl exec -n ollama deployment/ollama -- curl -s http://localhost:11434/api/embed -d '{"model":"qwen3-embedding:4b","input":"test"}'systemctl is-active ollamareturnsinactive(host service disabled)tofu planshows no drift after applyTest Expectations
tofu validatepassestofu fmt -checkpassestofu planshows exactly 3 new resources (namespace + 2 helm releases)Constraints
main.tfpatterns:yamlencode()for values,depends_onchains, resource naming conventionsaffinity: {}(no NFD on this cluster)ollama.gpu.enabledmechanism, not raw resource limitstofu apply:kubectl delete daemonset nvidia-device-plugin-daemonset -n kube-systemsudo salt-call --local state.apply servicesto disable host OllamaChecklist
tofu validatepassestofu fmt -checkpassesRelated
plan-2026-02-26-tf-modularize-postgres— parent plan (Phase 6: Vector Search)