Fix GPU visibility -- add runtimeClassName: nvidia to Helm values #27
No reviewers
Labels
No labels
domain:backend
domain:devops
domain:frontend
status:approved
status:in-progress
status:needs-fix
status:qa
type:bug
type:devops
type:feature
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
forgejo_admin/pal-e-platform!27
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "fix/gpu-runtime-class-26"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Summary
The NVIDIA device plugin and Ollama pods were running with the default
runcruntime instead of thenvidiaruntime, preventing GPU discovery via NVML. This addsruntimeClassName = "nvidia"to both Helm values blocks so pods use the correct container runtime and can see the GPU.Changes
terraform/main.tf: AddedruntimeClassName = "nvidia"tohelm_release.nvidia_device_pluginvalues blockterraform/main.tf: AddedruntimeClassName = "nvidia"tohelm_release.ollamavalues blockTest Plan
tofu planto confirm only the two Helm release values changetofu applyand verify the NVIDIA device plugin DaemonSet pod runs with nvidia runtimekubectl describe nodeshowsnvidia.com/gpu: 1in allocatable resourcesReview Checklist
tofu fmtpassestofu planoutput reviewed before applyRelated