Deploy Ollama + NVIDIA device plugin as platform services (#24) #25
No reviewers
Labels
No labels
domain:backend
domain:devops
domain:frontend
status:approved
status:in-progress
status:needs-fix
status:qa
type:bug
type:devops
type:feature
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
forgejo_admin/pal-e-platform!25
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "24-deploy-ollama-nvidia-device-plugin"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Closes #24
Summary
Brings the NVIDIA k8s device plugin under Terraform/Helm management and deploys Ollama in-cluster with GPU access for embedding generation (Phase 6 vector search). Disables the host systemd Ollama service to free the GPU for the k8s pod.
Changes
terraform/main.tf-- Added 3 new resources after the CNPG section:helm_release.nvidia_device_plugin: NVIDIA device plugin chart v0.17.4 in kube-system, withaffinity={}override (no NFD on this cluster) andfailOnInitError=falsekubernetes_namespace_v1.ollama: Dedicated ollama namespace following existing patternhelm_release.ollama: Ollama chart v1.49.0 with GPU enabled,qwen3-embedding:4bmodel pull, 10Gi local-path persistence, depends_on nvidia device pluginsalt/states/services/init.sls-- Changed ollama-service fromservice.running/enable: Truetoservice.dead/enable: Falsewith updated comments explaining k8s Ollama now owns the GPUTerraform details:
tofu plan output
tofu fmtpassedtofu validatepassed (requires provider cache -- not available in agent worktree)Test Plan
tofu planshows exactly 3 new resources (namespace + 2 helm releases)kubectl delete daemonset nvidia-device-plugin-daemonset -n kube-systemsudo salt-call --local state.apply servicesto disable host Ollamakubectl get daemonset -n kube-system | grep nvidiashows Helm-managed device pluginkubectl get pods -n ollamashows Ollama pod Runningkubectl exec -n ollama deployment/ollama -- ollama listshowsqwen3-embedding:4b/api/embedsystemctl is-active ollamareturnsinactivetofu planshows no drift after applyReview Checklist
Related
plan-2026-02-26-tf-modularize-postgres(Phase 6: Vector Search)Add Terraform-managed Helm releases for the NVIDIA k8s device plugin and Ollama with GPU support. This brings the existing manual NVIDIA DaemonSet under IaC management and deploys Ollama in-cluster for embedding generation (Phase 6 vector search). Terraform changes (3 new resources): - helm_release.nvidia_device_plugin: chart v0.17.4, affinity={} override for non-NFD cluster, failOnInitError=false - kubernetes_namespace_v1.ollama: dedicated namespace - helm_release.ollama: chart v1.49.0, GPU enabled, qwen3-embedding:4b model pull, 10Gi local-path persistence SaltStack change: - Disable host ollama systemd service (service.dead + enable: False) to free the GPU for the k8s pod Closes #24 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>