forgejo_admin/pal-e-platform

Fork

You've already forked pal-e-platform

Code Issues 38 Pull requests Projects Releases Packages Wiki Activity Actions

Deploy Ollama + NVIDIA device plugin as platform services #24

New issue

Closed

opened 2026-03-08 20:57:14 +00:00 by forgejo_admin · 0 comments

forgejo_admin commented

2026-03-08 20:57:14 +00:00

Owner

Copy link

Lineage

plan-2026-02-26-tf-modularize-postgres → Phase 6 → Phase 6a (Deploy Ollama as Platform Service)

Repo

forgejo_admin/pal-e-platform

User Story

As a platform operator
I want Ollama running in-cluster with GPU access managed by Terraform
So that the embedding worker (Phase 6b) can generate vectors via http://ollama.ollama.svc.cluster.local:11434

Context

Phase 6 (Vector Search) needs Ollama running in-cluster to generate embeddings via Qwen3-Embedding-4B. This is the first sub-phase — deploying Ollama as a reusable platform service.

Key facts:

NVIDIA k8s device plugin is ALREADY deployed (manually as a DaemonSet, not Terraform-managed). Image: nvcr.io/nvidia/k8s-device-plugin:v0.17.0. Must be brought under Helm/Terraform management.
Host Ollama (systemd) is running but idle (54 MiB GPU, 0% utilization). Must be disabled to free the GPU for the k8s pod.
GPU is visible to k8s: nvidia.com/gpu: 1.
No Node Feature Discovery (NFD) deployed — the NVIDIA device plugin Helm chart's default affinity rules (which match NFD labels) must be overridden to {} or the DaemonSet won't schedule.

Decisions made:

NVIDIA device plugin Helm chart version: 0.17.4 (closest patch to deployed v0.17.0 image; chart repo doesn't have 0.17.0)
Ollama Helm chart: ollama-helm/ollama version 1.49.0 (latest stable, app version 0.17.6) from https://otwld.github.io/ollama-helm/
Ollama chart uses ollama.gpu.enabled: true + ollama.gpu.number: 1 (not raw resource limits)
Models pulled via ollama.models.pull: ["qwen3-embedding:4b"]
Persistence via persistentVolume.enabled: true with local-path storageClass
Service: ClusterIP on 11434 (internal only, no Tailscale funnel)
SaltStack change: set host ollama service to dead + enable: False

File Targets

Files to modify:

terraform/main.tf — Add 3 resources after the CNPG section (~line 1097): helm_release.nvidia_device_plugin, kubernetes_namespace_v1.ollama, helm_release.ollama
salt/states/services/init.sls — Change ollama-service from service.running + enable: True to service.dead + enable: False. Update comments to explain why.

Files NOT to touch:

terraform/variables.tf — No new variables needed (no secrets for Ollama)
terraform/providers.tf — No new providers needed

Acceptance Criteria

kubectl get daemonset -n kube-system | grep nvidia shows Helm-managed device plugin running
kubectl get nodes -o json | jq '.items[0].status.capacity["nvidia.com/gpu"]' returns "1"
kubectl get pods -n ollama shows Ollama pod Running
kubectl exec -n ollama deployment/ollama -- ollama list shows qwen3-embedding:4b
Embedding test returns vector: kubectl exec -n ollama deployment/ollama -- curl -s http://localhost:11434/api/embed -d '{"model":"qwen3-embedding:4b","input":"test"}'
systemctl is-active ollama returns inactive (host service disabled)
tofu plan shows no drift after apply

Test Expectations

tofu validate passes
tofu fmt -check passes
tofu plan shows exactly 3 new resources (namespace + 2 helm releases)

Constraints

Follow existing main.tf patterns: yamlencode() for values, depends_on chains, resource naming conventions
NVIDIA device plugin: must override affinity: {} (no NFD on this cluster)
Ollama chart: use ollama.gpu.enabled mechanism, not raw resource limits
Timeout 600 for Ollama (model pull is ~4GB)
Manual step required before tofu apply: kubectl delete daemonset nvidia-device-plugin-daemonset -n kube-system
Manual step required: sudo salt-call --local state.apply services to disable host Ollama

Checklist

PR opened
tofu validate passes
tofu fmt -check passes
No unrelated changes

plan-2026-02-26-tf-modularize-postgres — parent plan (Phase 6: Vector Search)

### Lineage `plan-2026-02-26-tf-modularize-postgres` → Phase 6 → Phase 6a (Deploy Ollama as Platform Service) ### Repo `forgejo_admin/pal-e-platform` ### User Story As a platform operator I want Ollama running in-cluster with GPU access managed by Terraform So that the embedding worker (Phase 6b) can generate vectors via `http://ollama.ollama.svc.cluster.local:11434` ### Context Phase 6 (Vector Search) needs Ollama running in-cluster to generate embeddings via Qwen3-Embedding-4B. This is the first sub-phase — deploying Ollama as a reusable platform service. **Key facts:** - NVIDIA k8s device plugin is ALREADY deployed (manually as a DaemonSet, not Terraform-managed). Image: `nvcr.io/nvidia/k8s-device-plugin:v0.17.0`. Must be brought under Helm/Terraform management. - Host Ollama (systemd) is running but idle (54 MiB GPU, 0% utilization). Must be disabled to free the GPU for the k8s pod. - GPU is visible to k8s: `nvidia.com/gpu: 1`. - No Node Feature Discovery (NFD) deployed — the NVIDIA device plugin Helm chart's default `affinity` rules (which match NFD labels) must be overridden to `{}` or the DaemonSet won't schedule. **Decisions made:** - NVIDIA device plugin Helm chart version: `0.17.4` (closest patch to deployed v0.17.0 image; chart repo doesn't have 0.17.0) - Ollama Helm chart: `ollama-helm/ollama` version `1.49.0` (latest stable, app version 0.17.6) from `https://otwld.github.io/ollama-helm/` - Ollama chart uses `ollama.gpu.enabled: true` + `ollama.gpu.number: 1` (not raw resource limits) - Models pulled via `ollama.models.pull: ["qwen3-embedding:4b"]` - Persistence via `persistentVolume.enabled: true` with `local-path` storageClass - Service: ClusterIP on 11434 (internal only, no Tailscale funnel) - SaltStack change: set host ollama service to `dead` + `enable: False` ### File Targets Files to modify: - `terraform/main.tf` — Add 3 resources after the CNPG section (~line 1097): `helm_release.nvidia_device_plugin`, `kubernetes_namespace_v1.ollama`, `helm_release.ollama` - `salt/states/services/init.sls` — Change ollama-service from `service.running` + `enable: True` to `service.dead` + `enable: False`. Update comments to explain why. Files NOT to touch: - `terraform/variables.tf` — No new variables needed (no secrets for Ollama) - `terraform/providers.tf` — No new providers needed ### Acceptance Criteria - [ ] `kubectl get daemonset -n kube-system | grep nvidia` shows Helm-managed device plugin running - [ ] `kubectl get nodes -o json | jq '.items[0].status.capacity["nvidia.com/gpu"]'` returns `"1"` - [ ] `kubectl get pods -n ollama` shows Ollama pod Running - [ ] `kubectl exec -n ollama deployment/ollama -- ollama list` shows `qwen3-embedding:4b` - [ ] Embedding test returns vector: `kubectl exec -n ollama deployment/ollama -- curl -s http://localhost:11434/api/embed -d '{"model":"qwen3-embedding:4b","input":"test"}'` - [ ] `systemctl is-active ollama` returns `inactive` (host service disabled) - [ ] `tofu plan` shows no drift after apply ### Test Expectations - [ ] `tofu validate` passes - [ ] `tofu fmt -check` passes - [ ] `tofu plan` shows exactly 3 new resources (namespace + 2 helm releases) ### Constraints - Follow existing `main.tf` patterns: `yamlencode()` for values, `depends_on` chains, resource naming conventions - NVIDIA device plugin: must override `affinity: {}` (no NFD on this cluster) - Ollama chart: use `ollama.gpu.enabled` mechanism, not raw resource limits - Timeout 600 for Ollama (model pull is ~4GB) - Manual step required before `tofu apply`: `kubectl delete daemonset nvidia-device-plugin-daemonset -n kube-system` - Manual step required: `sudo salt-call --local state.apply services` to disable host Ollama ### Checklist - [ ] PR opened - [ ] `tofu validate` passes - [ ] `tofu fmt -check` passes - [ ] No unrelated changes ### Related - `plan-2026-02-26-tf-modularize-postgres` — parent plan (Phase 6: Vector Search)

forgejo_admin referenced this issue from a commit

2026-03-08 20:59:45 +00:00

Deploy Ollama + NVIDIA device plugin as platform services

forgejo_admin referenced this issue from a pull request that will close it,

2026-03-08 21:00:51 +00:00

Deploy Ollama + NVIDIA device plugin as platform services (#24) #25

forgejo_admin closed this issue

2026-03-09 01:55:56 +00:00

forgejo_admin referenced this issue from a commit

2026-03-09 01:55:57 +00:00

Deploy Ollama + NVIDIA device plugin as platform services (#24) (#25)

forgejo_admin referenced this issue

2026-03-09 02:46:17 +00:00

Fix GPU visibility — add runtimeClassName: nvidia to Helm values #26

No Branch/Tag specified

main

188-cross-repo-worktree-isolation-for-parall

111-fix-keycloak-probe

86-fix-rotate-woodpecker-api-token-in-salt

71-feat-deploy-pyrra-slo-manager-with-servi

64-hotfix-woodpecker-oauth-login-broken-for

18-fix-grafana-duplicate-default-datasource-v2

18-fix-grafana-duplicate-default-datasource

13-fix-cnpg-all-parameters

No results found.

Labels

Clear labels

QA passed, awaiting merge approval

status:in-progress

Dev agent is actively working

status:needs-fix

QA found issues, back to dev

status:qa

PR submitted, awaiting QA review

type:bug

Bug fix

type:devops

Infrastructure/CI/config work

No labels

Milestone

Clear milestone

No items

No milestone

Projects

Clear projects

No items

No project

Assignees

Clear assignees

No assignees

1 participant

Notifications

Due date

The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference

forgejo_admin/pal-e-platform#24

Reference in a new issue

Repository

forgejo_admin/pal-e-platform

Title

Body

No description provided.

Delete branch "%!s()"

Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?

Rows
Columns