Bug: Harbor robot account expired — ImagePullBackOff on new deployments #33

Open
opened 2026-04-03 23:10:20 +00:00 by forgejo_admin · 1 comment

Type

Bug

Lineage

Discovered during validation of #30 (Nemo qwen infra). Blocks all future deployments of westside-ai-assistant.

Repo

forgejo_admin/pal-e-services (Harbor robot account provisioned via Terraform)

What Broke

The harbor-creds secret in the westside-ai-assistant namespace contains credentials for a robot account that no longer exists in Harbor. The only robot account in Harbor is robot$image-updater. Any new pod rollout fails with ImagePullBackOff / 401 Unauthorized pulling from harbor.tail5b443a.ts.net/westside-ai-assistant/api:latest.

The old pod (from March 31) still runs because it cached the image locally.

Repro Steps

  1. Trigger any deployment change in westside-ai-assistant namespace
  2. New pod created, attempts to pull image
  3. kubectl get events -n westside-ai-assistant shows: "401 Unauthorized"
  4. curl -u robot$westside-ai-as...:... harbor/v2/westside-ai-assistant/api/tags/list returns UNAUTHORIZED

Expected Behavior

Pod should pull the image successfully using valid Harbor credentials.

Environment

  • Cluster/namespace: prod / westside-ai-assistant
  • Harbor project: westside-ai-assistant
  • Related: ArgoCD sync succeeded but pod can't pull image

Acceptance Criteria

  • Valid Harbor robot account exists for westside-ai-assistant project
  • harbor-creds secret updated in namespace with working credentials
  • New pod can pull image successfully
  • No ImagePullBackOff events
  • project-westside-ai-assistant
  • Blocks: #29 (Nemo app SDK swap — needs successful image deployment)
### Type Bug ### Lineage Discovered during validation of #30 (Nemo qwen infra). Blocks all future deployments of westside-ai-assistant. ### Repo `forgejo_admin/pal-e-services` (Harbor robot account provisioned via Terraform) ### What Broke The `harbor-creds` secret in the `westside-ai-assistant` namespace contains credentials for a robot account that no longer exists in Harbor. The only robot account in Harbor is `robot$image-updater`. Any new pod rollout fails with `ImagePullBackOff` / 401 Unauthorized pulling from `harbor.tail5b443a.ts.net/westside-ai-assistant/api:latest`. The old pod (from March 31) still runs because it cached the image locally. ### Repro Steps 1. Trigger any deployment change in westside-ai-assistant namespace 2. New pod created, attempts to pull image 3. `kubectl get events -n westside-ai-assistant` shows: "401 Unauthorized" 4. `curl -u robot$westside-ai-as...:... harbor/v2/westside-ai-assistant/api/tags/list` returns UNAUTHORIZED ### Expected Behavior Pod should pull the image successfully using valid Harbor credentials. ### Environment - Cluster/namespace: prod / westside-ai-assistant - Harbor project: westside-ai-assistant - Related: ArgoCD sync succeeded but pod can't pull image ### Acceptance Criteria - [ ] Valid Harbor robot account exists for westside-ai-assistant project - [ ] harbor-creds secret updated in namespace with working credentials - [ ] New pod can pull image successfully - [ ] No ImagePullBackOff events ### Related - `project-westside-ai-assistant` - Blocks: #29 (Nemo app SDK swap — needs successful image deployment)
Author
Owner

Scope Review: NEEDS_REFINEMENT

Review note: review-761-2026-04-03

Repo placement mismatch: Issue says fix is in forgejo_admin/pal-e-services, but westside-ai-assistant is not in pal-e-services var.services at all. The service was never onboarded into the standard provisioning pipeline — harbor-creds was created through a different mechanism.

Issues found:

  • [BODY] Clarify fix path: (a) proper onboarding into pal-e-services var.services (correct long-term fix, auto-provisions duration = -1 robots), or (b) manual Harbor robot re-creation (bandaid). Update Repo field and add file targets accordingly.
  • [BODY] Add AC for onboarding verification if path (a) chosen.
  • [SCOPE] arch-A4 architecture note does not exist in pal-e-docs — needs creation.

Blast radius: No other namespaces affected. All other services use pal-e-services pipeline with non-expiring robots. Only westside-ai-assistant is impacted.

## Scope Review: NEEDS_REFINEMENT Review note: `review-761-2026-04-03` **Repo placement mismatch**: Issue says fix is in `forgejo_admin/pal-e-services`, but `westside-ai-assistant` is not in pal-e-services `var.services` at all. The service was never onboarded into the standard provisioning pipeline — harbor-creds was created through a different mechanism. Issues found: - **[BODY]** Clarify fix path: (a) proper onboarding into pal-e-services `var.services` (correct long-term fix, auto-provisions `duration = -1` robots), or (b) manual Harbor robot re-creation (bandaid). Update Repo field and add file targets accordingly. - **[BODY]** Add AC for onboarding verification if path (a) chosen. - **[SCOPE]** arch-A4 architecture note does not exist in pal-e-docs — needs creation. Blast radius: No other namespaces affected. All other services use pal-e-services pipeline with non-expiring robots. Only westside-ai-assistant is impacted.
Sign in to join this conversation.
No labels
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
forgejo_admin/westside-ai-assistant#33
No description provided.