Deploy Ollama + NVIDIA device plugin as platform services (#24) #25

Merged
forgejo_admin merged 1 commit from 24-deploy-ollama-nvidia-device-plugin into main 2026-03-09 01:55:56 +00:00

Closes #24

Summary

Brings the NVIDIA k8s device plugin under Terraform/Helm management and deploys Ollama in-cluster with GPU access for embedding generation (Phase 6 vector search). Disables the host systemd Ollama service to free the GPU for the k8s pod.

Changes

  • terraform/main.tf -- Added 3 new resources after the CNPG section:
    • helm_release.nvidia_device_plugin: NVIDIA device plugin chart v0.17.4 in kube-system, with affinity={} override (no NFD on this cluster) and failOnInitError=false
    • kubernetes_namespace_v1.ollama: Dedicated ollama namespace following existing pattern
    • helm_release.ollama: Ollama chart v1.49.0 with GPU enabled, qwen3-embedding:4b model pull, 10Gi local-path persistence, depends_on nvidia device plugin
  • salt/states/services/init.sls -- Changed ollama-service from service.running/enable: True to service.dead/enable: False with updated comments explaining k8s Ollama now owns the GPU

Terraform details:

tofu plan output
Cannot run tofu plan in agent worktree (no provider cache).
Expected plan: 3 new resources (namespace + 2 helm releases), 0 changes, 0 destroys.

Manual pre-apply steps required:
  kubectl delete daemonset nvidia-device-plugin-daemonset -n kube-system
  sudo salt-call --local state.apply services
  • tofu fmt passed
  • tofu validate passed (requires provider cache -- not available in agent worktree)

Test Plan

  • tofu plan shows exactly 3 new resources (namespace + 2 helm releases)
  • Pre-apply: kubectl delete daemonset nvidia-device-plugin-daemonset -n kube-system
  • Pre-apply: sudo salt-call --local state.apply services to disable host Ollama
  • After apply: kubectl get daemonset -n kube-system | grep nvidia shows Helm-managed device plugin
  • After apply: kubectl get pods -n ollama shows Ollama pod Running
  • After apply: kubectl exec -n ollama deployment/ollama -- ollama list shows qwen3-embedding:4b
  • After apply: embedding test returns vector via /api/embed
  • systemctl is-active ollama returns inactive
  • tofu plan shows no drift after apply

Review Checklist

  • Review-fix loop passed (clean review, zero issues)
  • User approved merge
  • Plan: plan-2026-02-26-tf-modularize-postgres (Phase 6: Vector Search)
  • Forgejo issue: #24
Closes #24 ## Summary Brings the NVIDIA k8s device plugin under Terraform/Helm management and deploys Ollama in-cluster with GPU access for embedding generation (Phase 6 vector search). Disables the host systemd Ollama service to free the GPU for the k8s pod. ## Changes - **`terraform/main.tf`** -- Added 3 new resources after the CNPG section: - `helm_release.nvidia_device_plugin`: NVIDIA device plugin chart v0.17.4 in kube-system, with `affinity={}` override (no NFD on this cluster) and `failOnInitError=false` - `kubernetes_namespace_v1.ollama`: Dedicated ollama namespace following existing pattern - `helm_release.ollama`: Ollama chart v1.49.0 with GPU enabled, `qwen3-embedding:4b` model pull, 10Gi local-path persistence, depends_on nvidia device plugin - **`salt/states/services/init.sls`** -- Changed ollama-service from `service.running`/`enable: True` to `service.dead`/`enable: False` with updated comments explaining k8s Ollama now owns the GPU **Terraform details:** <details> <summary>tofu plan output</summary> ``` Cannot run tofu plan in agent worktree (no provider cache). Expected plan: 3 new resources (namespace + 2 helm releases), 0 changes, 0 destroys. Manual pre-apply steps required: kubectl delete daemonset nvidia-device-plugin-daemonset -n kube-system sudo salt-call --local state.apply services ``` </details> - [x] `tofu fmt` passed - [ ] `tofu validate` passed (requires provider cache -- not available in agent worktree) ## Test Plan - [ ] `tofu plan` shows exactly 3 new resources (namespace + 2 helm releases) - [ ] Pre-apply: `kubectl delete daemonset nvidia-device-plugin-daemonset -n kube-system` - [ ] Pre-apply: `sudo salt-call --local state.apply services` to disable host Ollama - [ ] After apply: `kubectl get daemonset -n kube-system | grep nvidia` shows Helm-managed device plugin - [ ] After apply: `kubectl get pods -n ollama` shows Ollama pod Running - [ ] After apply: `kubectl exec -n ollama deployment/ollama -- ollama list` shows `qwen3-embedding:4b` - [ ] After apply: embedding test returns vector via `/api/embed` - [ ] `systemctl is-active ollama` returns `inactive` - [ ] `tofu plan` shows no drift after apply ## Review Checklist - [ ] Review-fix loop passed (clean review, zero issues) - [ ] User approved merge ## Related - Plan: `plan-2026-02-26-tf-modularize-postgres` (Phase 6: Vector Search) - Forgejo issue: #24
Add Terraform-managed Helm releases for the NVIDIA k8s device plugin
and Ollama with GPU support. This brings the existing manual NVIDIA
DaemonSet under IaC management and deploys Ollama in-cluster for
embedding generation (Phase 6 vector search).

Terraform changes (3 new resources):
- helm_release.nvidia_device_plugin: chart v0.17.4, affinity={} override
  for non-NFD cluster, failOnInitError=false
- kubernetes_namespace_v1.ollama: dedicated namespace
- helm_release.ollama: chart v1.49.0, GPU enabled, qwen3-embedding:4b
  model pull, 10Gi local-path persistence

SaltStack change:
- Disable host ollama systemd service (service.dead + enable: False)
  to free the GPU for the k8s pod

Closes #24

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
forgejo_admin deleted branch 24-deploy-ollama-nvidia-device-plugin 2026-03-09 01:55:56 +00:00
Sign in to join this conversation.
No description provided.