Fix GPU visibility -- add runtimeClassName: nvidia to Helm values #27

Merged

forgejo_admin merged 1 commit from fix/gpu-runtime-class-26 into main

2026-03-09 03:04:19 +00:00

forgejo_admin commented

2026-03-09 02:47:34 +00:00

Owner

Summary

The NVIDIA device plugin and Ollama pods were running with the default runc runtime instead of the nvidia runtime, preventing GPU discovery via NVML. This adds runtimeClassName = "nvidia" to both Helm values blocks so pods use the correct container runtime and can see the GPU.

Changes

terraform/main.tf: Added runtimeClassName = "nvidia" to helm_release.nvidia_device_plugin values block
terraform/main.tf: Added runtimeClassName = "nvidia" to helm_release.ollama values block

Test Plan

Run tofu plan to confirm only the two Helm release values change
Run tofu apply and verify the NVIDIA device plugin DaemonSet pod runs with nvidia runtime
Verify kubectl describe node shows nvidia.com/gpu: 1 in allocatable resources
Verify Ollama pod starts successfully and can load a model using the GPU

Review Checklist

tofu fmt passes
Changes are minimal -- only two values blocks modified
Commit message references Closes #26
tofu plan output reviewed before apply

Forgejo issue: #26

## Summary The NVIDIA device plugin and Ollama pods were running with the default `runc` runtime instead of the `nvidia` runtime, preventing GPU discovery via NVML. This adds `runtimeClassName = "nvidia"` to both Helm values blocks so pods use the correct container runtime and can see the GPU. ## Changes - `terraform/main.tf`: Added `runtimeClassName = "nvidia"` to `helm_release.nvidia_device_plugin` values block - `terraform/main.tf`: Added `runtimeClassName = "nvidia"` to `helm_release.ollama` values block ## Test Plan - [ ] Run `tofu plan` to confirm only the two Helm release values change - [ ] Run `tofu apply` and verify the NVIDIA device plugin DaemonSet pod runs with nvidia runtime - [ ] Verify `kubectl describe node` shows `nvidia.com/gpu: 1` in allocatable resources - [ ] Verify Ollama pod starts successfully and can load a model using the GPU ## Review Checklist - [x] `tofu fmt` passes - [x] Changes are minimal -- only two values blocks modified - [x] Commit message references Closes #26 - [ ] `tofu plan` output reviewed before apply ## Related - Forgejo issue: #26

forgejo_admin added 1 commit

2026-03-09 02:47:34 +00:00

Fix GPU visibility by adding runtimeClassName: nvidia to Helm values df5edb7429

The NVIDIA device plugin and Ollama pods were running with the default
runc runtime instead of the nvidia runtime, preventing GPU discovery
via NVML. Add runtimeClassName = "nvidia" to both helm_release values
blocks so pods use the correct container runtime.

Closes #26

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>