Fix GPU visibility -- add runtimeClassName: nvidia to Helm values #27

Merged
forgejo_admin merged 1 commit from fix/gpu-runtime-class-26 into main 2026-03-09 03:04:19 +00:00

Summary

The NVIDIA device plugin and Ollama pods were running with the default runc runtime instead of the nvidia runtime, preventing GPU discovery via NVML. This adds runtimeClassName = "nvidia" to both Helm values blocks so pods use the correct container runtime and can see the GPU.

Changes

  • terraform/main.tf: Added runtimeClassName = "nvidia" to helm_release.nvidia_device_plugin values block
  • terraform/main.tf: Added runtimeClassName = "nvidia" to helm_release.ollama values block

Test Plan

  • Run tofu plan to confirm only the two Helm release values change
  • Run tofu apply and verify the NVIDIA device plugin DaemonSet pod runs with nvidia runtime
  • Verify kubectl describe node shows nvidia.com/gpu: 1 in allocatable resources
  • Verify Ollama pod starts successfully and can load a model using the GPU

Review Checklist

  • tofu fmt passes
  • Changes are minimal -- only two values blocks modified
  • Commit message references Closes #26
  • tofu plan output reviewed before apply
  • Forgejo issue: #26
## Summary The NVIDIA device plugin and Ollama pods were running with the default `runc` runtime instead of the `nvidia` runtime, preventing GPU discovery via NVML. This adds `runtimeClassName = "nvidia"` to both Helm values blocks so pods use the correct container runtime and can see the GPU. ## Changes - `terraform/main.tf`: Added `runtimeClassName = "nvidia"` to `helm_release.nvidia_device_plugin` values block - `terraform/main.tf`: Added `runtimeClassName = "nvidia"` to `helm_release.ollama` values block ## Test Plan - [ ] Run `tofu plan` to confirm only the two Helm release values change - [ ] Run `tofu apply` and verify the NVIDIA device plugin DaemonSet pod runs with nvidia runtime - [ ] Verify `kubectl describe node` shows `nvidia.com/gpu: 1` in allocatable resources - [ ] Verify Ollama pod starts successfully and can load a model using the GPU ## Review Checklist - [x] `tofu fmt` passes - [x] Changes are minimal -- only two values blocks modified - [x] Commit message references Closes #26 - [ ] `tofu plan` output reviewed before apply ## Related - Forgejo issue: #26
The NVIDIA device plugin and Ollama pods were running with the default
runc runtime instead of the nvidia runtime, preventing GPU discovery
via NVML. Add runtimeClassName = "nvidia" to both helm_release values
blocks so pods use the correct container runtime.

Closes #26

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
forgejo_admin deleted branch fix/gpu-runtime-class-26 2026-03-09 03:04:19 +00:00
Sign in to join this conversation.
No description provided.