Infra: deploy qwen3.5:4b to cluster Ollama and update Nemo env vars #30

Closed
opened 2026-04-03 17:38:30 +00:00 by forgejo_admin · 1 comment

Type

Feature

Lineage

Supersedes #16 (Anthropic/Ollama feature flag — no longer want a flag, want full replacement). Related to Nemo being dead since 2026-03-31 due to Anthropic credit exhaustion.

Repo

Multiple — forgejo_admin/pal-e-deployments (kustomize), forgejo_admin/pal-e-platform (NetworkPolicy if needed)

User Story

As an admin (Marcus)
I want Nemo to run on a local model that doesn't depend on paid API credits
So that the AI assistant is always available without billing constraints

Context

Nemo has been dead since March 31 because the Anthropic API key has $0 credits. The pod is running but every message returns "Sorry, I'm having trouble right now." The cluster Ollama has qwen2.5:7b but we want qwen3.5:4b (3.4 GB, downloaded locally on archbox, better reasoning per parameter). This ticket prepares the infrastructure so the app ticket (#29) can swap the SDK.

File Targets

Files to modify:

  • pal-e-deployments/overlays/westside-ai-assistant/prod/deployment-patch.yaml — remove ANTHROPIC_MODEL env var, add OLLAMA_BASE_URL=http://ollama.ollama.svc:11434 and OLLAMA_MODEL=qwen3.5:4b

Files to check (modify only if needed):

  • pal-e-platform/terraform/network-policies.tf — verify or add egress rule: westside-ai-assistant → ollama namespace on port 11434

Manual steps (not file changes):

  • kubectl exec -n ollama deploy/ollama -- ollama pull qwen3.5:4b — pull model to cluster

Acceptance Criteria

  • kubectl exec -n ollama deploy/ollama -- ollama list shows qwen3.5:4b
  • Nemo deployment patch has OLLAMA_BASE_URL and OLLAMA_MODEL env vars
  • ANTHROPIC_MODEL env var removed from deployment patch
  • NetworkPolicy allows westside-ai-assistant → ollama on port 11434
  • Nemo pod can curl http://ollama.ollama.svc:11434/api/tags successfully

Test Expectations

  • kubectl exec -n westside-ai-assistant deploy/westside-ai-assistant -- curl -s http://ollama.ollama.svc:11434/api/tags returns JSON with qwen3.5:4b listed
  • Run command: manual kubectl verification

Constraints

  • Do NOT modify Nemo application code — that's the companion app ticket (#29)
  • Do NOT remove the Keycloak client secret — still needed for basketball-api auth
  • NetworkPolicy changes go through pal-e-platform with tofu plan -lock=false

Checklist

  • PR opened (pal-e-deployments)
  • PR opened (pal-e-platform, if NetworkPolicy needed)
  • No unrelated changes
  • project-westside-ai-assistant
  • Supersedes: #16 (model provider switch)
  • Companion: #29 (SDK swap + tool definitions)
### Type Feature ### Lineage Supersedes #16 (Anthropic/Ollama feature flag — no longer want a flag, want full replacement). Related to Nemo being dead since 2026-03-31 due to Anthropic credit exhaustion. ### Repo Multiple — `forgejo_admin/pal-e-deployments` (kustomize), `forgejo_admin/pal-e-platform` (NetworkPolicy if needed) ### User Story As an **admin (Marcus)** I want **Nemo to run on a local model that doesn't depend on paid API credits** So that **the AI assistant is always available without billing constraints** ### Context Nemo has been dead since March 31 because the Anthropic API key has $0 credits. The pod is running but every message returns "Sorry, I'm having trouble right now." The cluster Ollama has `qwen2.5:7b` but we want `qwen3.5:4b` (3.4 GB, downloaded locally on archbox, better reasoning per parameter). This ticket prepares the infrastructure so the app ticket (#29) can swap the SDK. ### File Targets Files to modify: - `pal-e-deployments/overlays/westside-ai-assistant/prod/deployment-patch.yaml` — remove `ANTHROPIC_MODEL` env var, add `OLLAMA_BASE_URL=http://ollama.ollama.svc:11434` and `OLLAMA_MODEL=qwen3.5:4b` Files to check (modify only if needed): - `pal-e-platform/terraform/network-policies.tf` — verify or add egress rule: westside-ai-assistant → ollama namespace on port 11434 Manual steps (not file changes): - `kubectl exec -n ollama deploy/ollama -- ollama pull qwen3.5:4b` — pull model to cluster ### Acceptance Criteria - [ ] `kubectl exec -n ollama deploy/ollama -- ollama list` shows `qwen3.5:4b` - [ ] Nemo deployment patch has `OLLAMA_BASE_URL` and `OLLAMA_MODEL` env vars - [ ] `ANTHROPIC_MODEL` env var removed from deployment patch - [ ] NetworkPolicy allows westside-ai-assistant → ollama on port 11434 - [ ] Nemo pod can curl `http://ollama.ollama.svc:11434/api/tags` successfully ### Test Expectations - [ ] `kubectl exec -n westside-ai-assistant deploy/westside-ai-assistant -- curl -s http://ollama.ollama.svc:11434/api/tags` returns JSON with qwen3.5:4b listed - Run command: manual kubectl verification ### Constraints - Do NOT modify Nemo application code — that's the companion app ticket (#29) - Do NOT remove the Keycloak client secret — still needed for basketball-api auth - NetworkPolicy changes go through pal-e-platform with `tofu plan -lock=false` ### Checklist - [ ] PR opened (pal-e-deployments) - [ ] PR opened (pal-e-platform, if NetworkPolicy needed) - [ ] No unrelated changes ### Related - `project-westside-ai-assistant` - Supersedes: #16 (model provider switch) - Companion: #29 (SDK swap + tool definitions)
Author
Owner

Scope Review: APPROVED

Review note: review-748-2026-04-03
Ticket is well-scoped. All file targets verified (deployment-patch.yaml exists with ANTHROPIC_MODEL, NetworkPolicy rule already in main from #246). 5 AC, <5 min estimate. No decomposition needed.

  • [SCOPE] Create architecture note arch-A4 (deferred — does not block execution)
## Scope Review: APPROVED Review note: `review-748-2026-04-03` Ticket is well-scoped. All file targets verified (deployment-patch.yaml exists with ANTHROPIC_MODEL, NetworkPolicy rule already in main from #246). 5 AC, <5 min estimate. No decomposition needed. - [SCOPE] Create architecture note arch-A4 (deferred — does not block execution)
Sign in to join this conversation.
No labels
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
forgejo_admin/westside-ai-assistant#30
No description provided.