Fix: pal-e-docs-app pod ImagePullBackOff from Harbor #234

Open
opened 2026-03-28 19:55:19 +00:00 by forgejo_admin · 4 comments

Type

Bug

Lineage

  • Board: board-pal-e-docs
  • Story: story:reader-browse
  • Arch: arch:k8s-deploy
  • Blocking: resume portfolio live demo links

Repo

forgejo_admin/pal-e-services (Harbor config at k3s.tfvars lines 137-144) + forgejo_admin/pal-e-deployments (kustomize overlay)

What Broke

pal-e-docs-app pod is in ImagePullBackOff. Image harbor.tail5b443a.ts.net/pal-e-docs-app/app:e23a1d8c... cannot be pulled from Harbor.

Repro Steps

  1. kubectl get pods -n pal-e-docs-app — shows ImagePullBackOff
  2. kubectl describe pod -n pal-e-docs-app — shows Back-off pulling image

Expected Behavior

Pod should pull image from Harbor and reach Running state. Site should respond at pal-e-docs-app.tail5b443a.ts.net.

Environment

  • Namespace: pal-e-docs-app (created ~3h ago)
  • Image: harbor.tail5b443a.ts.net/pal-e-docs-app/app:e23a1d8c...
  • Related: board item #510 (pal-e-app rename to pal-e-docs-app) may be root cause

Context

Likely root cause: the repo was renamed from pal-e-app to pal-e-docs-app. The Harbor project may still be under the old name, or the image may never have been built under the new name. The kustomize overlay and Woodpecker pipeline need to reference the correct Harbor project.

Harbor config is managed by pal-e-services terraform, NOT pal-e-platform.

Investigation Steps

  1. Check if Harbor has a pal-e-docs-app project: curl Harbor API
  2. Check if image exists under old name pal-e-app: same
  3. Check harbor-creds secret in pal-e-docs-app namespace
  4. Check if Woodpecker has built this repo's pipeline
  5. If image doesn't exist, trigger a build

Acceptance Criteria

  • Root cause identified (missing image, wrong project name, or bad credentials)
  • Image exists in Harbor at the correct project/tag
  • harbor-creds secret exists in pal-e-docs-app namespace
  • Pod reaches Running state
  • curl -sI https://pal-e-docs-app.tail5b443a.ts.net returns 200

Test Expectations

  • kubectl get pods -n pal-e-docs-app — Running 1/1
  • Site loads in browser

Constraints

  • Harbor config is in pal-e-services, not pal-e-platform
  • May need to create/rename Harbor project if rename wasn't propagated
  • Check if ArgoCD is managing this namespace

Checklist

  • Investigate Harbor project state
  • Fix root cause
  • Verify pod starts
  • Verify site loads
  • Board item #510 — pal-e-app rename to pal-e-docs-app (potential root cause)
  • forgejo_admin/pal-e-platform#184 — Harbor connectivity incident
### Type Bug ### Lineage - Board: board-pal-e-docs - Story: story:reader-browse - Arch: arch:k8s-deploy - Blocking: resume portfolio live demo links ### Repo `forgejo_admin/pal-e-services` (Harbor config at k3s.tfvars lines 137-144) + `forgejo_admin/pal-e-deployments` (kustomize overlay) ### What Broke pal-e-docs-app pod is in ImagePullBackOff. Image `harbor.tail5b443a.ts.net/pal-e-docs-app/app:e23a1d8c...` cannot be pulled from Harbor. ### Repro Steps 1. `kubectl get pods -n pal-e-docs-app` — shows ImagePullBackOff 2. `kubectl describe pod -n pal-e-docs-app` — shows Back-off pulling image ### Expected Behavior Pod should pull image from Harbor and reach Running state. Site should respond at pal-e-docs-app.tail5b443a.ts.net. ### Environment - Namespace: pal-e-docs-app (created ~3h ago) - Image: harbor.tail5b443a.ts.net/pal-e-docs-app/app:e23a1d8c... - Related: board item #510 (pal-e-app rename to pal-e-docs-app) may be root cause ### Context Likely root cause: the repo was renamed from pal-e-app to pal-e-docs-app. The Harbor project may still be under the old name, or the image may never have been built under the new name. The kustomize overlay and Woodpecker pipeline need to reference the correct Harbor project. Harbor config is managed by pal-e-services terraform, NOT pal-e-platform. ### Investigation Steps 1. Check if Harbor has a `pal-e-docs-app` project: `curl Harbor API` 2. Check if image exists under old name `pal-e-app`: same 3. Check harbor-creds secret in pal-e-docs-app namespace 4. Check if Woodpecker has built this repo's pipeline 5. If image doesn't exist, trigger a build ### Acceptance Criteria - [ ] Root cause identified (missing image, wrong project name, or bad credentials) - [ ] Image exists in Harbor at the correct project/tag - [ ] harbor-creds secret exists in pal-e-docs-app namespace - [ ] Pod reaches Running state - [ ] `curl -sI https://pal-e-docs-app.tail5b443a.ts.net` returns 200 ### Test Expectations - [ ] `kubectl get pods -n pal-e-docs-app` — Running 1/1 - [ ] Site loads in browser ### Constraints - Harbor config is in pal-e-services, not pal-e-platform - May need to create/rename Harbor project if rename wasn't propagated - Check if ArgoCD is managing this namespace ### Checklist - [ ] Investigate Harbor project state - [ ] Fix root cause - [ ] Verify pod starts - [ ] Verify site loads ### Related - Board item #510 — pal-e-app rename to pal-e-docs-app (potential root cause) - `forgejo_admin/pal-e-platform#184` — Harbor connectivity incident
Author
Owner

Scope Review: NEEDS_REFINEMENT

Review note: review-613-2026-03-28

Ticket has good operational context and testable AC, but repo placement is wrong and issue type is misclassified.

  • Repo mismatch: Issue says pal-e-platform (Harbor config) but pal-e-platform has zero references to pal-e-docs-app. Harbor config lives in pal-e-services/terraform/k3s.tfvars (lines 137-144). Correct repo is forgejo_admin/pal-e-services.
  • Type mismatch: Issue body says ### Type: Feature but board item labels say type:bug. ImagePullBackOff is broken behavior — should be Bug.
  • Missing bug-template sections: Repro Steps, Expected Behavior, and structured Environment section are absent (info is loosely embedded in Context).
  • Dependency flag: Board item #510 (pal-e-app rename to pal-e-docs-app) may be the root cause — if rename executed, old image SHA may not exist in Harbor under the new project name.
## Scope Review: NEEDS_REFINEMENT Review note: `review-613-2026-03-28` Ticket has good operational context and testable AC, but repo placement is wrong and issue type is misclassified. - **Repo mismatch**: Issue says `pal-e-platform (Harbor config)` but pal-e-platform has zero references to pal-e-docs-app. Harbor config lives in `pal-e-services/terraform/k3s.tfvars` (lines 137-144). Correct repo is `forgejo_admin/pal-e-services`. - **Type mismatch**: Issue body says `### Type: Feature` but board item labels say `type:bug`. ImagePullBackOff is broken behavior — should be Bug. - **Missing bug-template sections**: Repro Steps, Expected Behavior, and structured Environment section are absent (info is loosely embedded in Context). - **Dependency flag**: Board item #510 (pal-e-app rename to pal-e-docs-app) may be the root cause — if rename executed, old image SHA may not exist in Harbor under the new project name.
Author
Owner

Investigation Complete

Findings

Check Result
Harbor project pal-e-docs-app Exists (project_id 23), repo_count=0 -- no images ever pushed
Harbor project pal-e-app (old name) Does not exist
Pull secret harbor-creds in ns Exists
Woodpecker repo activated Yes (id 18, active)
Pipeline history All 108 pipelines failed at clone step
pal-e-services tfvars config Correct (pal-e-docs-app/app image_repo)

Root Cause

The .woodpecker.yaml clone step used woodpeckerci/plugin-git which starts cloning immediately on pod start. There is a race condition where k3s iptables rules for ClusterIP services are not yet propagated to newly scheduled pods. curl/libcurl (used by git) gets "Connection refused" while wget (different socket impl) succeeds from the same pod at the same time. Adding sleep 2 before git commands resolves it -- this is the pattern already used by every other working repo.

Fix

PR submitted on pal-e-docs-app: forgejo_admin/pal-e-docs-app#93

After merge, the pipeline should:

  1. Clone successfully
  2. Build and push image to Harbor pal-e-docs-app/app
  3. Update kustomize tag in pal-e-deployments
  4. ArgoCD rolls out new deployment, pod exits ImagePullBackOff
## Investigation Complete ### Findings | Check | Result | |-------|--------| | Harbor project `pal-e-docs-app` | Exists (project_id 23), repo_count=0 -- no images ever pushed | | Harbor project `pal-e-app` (old name) | Does not exist | | Pull secret `harbor-creds` in ns | Exists | | Woodpecker repo activated | Yes (id 18, active) | | Pipeline history | All 108 pipelines failed at clone step | | pal-e-services tfvars config | Correct (`pal-e-docs-app/app` image_repo) | ### Root Cause The `.woodpecker.yaml` clone step used `woodpeckerci/plugin-git` which starts cloning immediately on pod start. There is a race condition where k3s iptables rules for ClusterIP services are not yet propagated to newly scheduled pods. `curl`/`libcurl` (used by git) gets "Connection refused" while `wget` (different socket impl) succeeds from the same pod at the same time. Adding `sleep 2` before git commands resolves it -- this is the pattern already used by every other working repo. ### Fix PR submitted on pal-e-docs-app: https://forgejo.tail5b443a.ts.net/forgejo_admin/pal-e-docs-app/pulls/93 After merge, the pipeline should: 1. Clone successfully 2. Build and push image to Harbor `pal-e-docs-app/app` 3. Update kustomize tag in pal-e-deployments 4. ArgoCD rolls out new deployment, pod exits ImagePullBackOff
Author
Owner

Investigation Complete

Root cause: NOT an infrastructure problem. The pal-e-docs-app CI pipeline has never successfully built an image. All 110 pipelines failed at check/lint before reaching the Kaniko build step.

Findings

  • Harbor project pal-e-docs-app exists but has zero images
  • Namespace, harbor-creds secret, kustomize overlay all correctly configured
  • The image tag in kustomization.yaml (e23a1d8c...) was set manually — no image was ever pushed for it
  • Pipeline #110 failures: 2 TypeScript errors + 6 ESLint errors

Fix Path

Single repo fix in forgejo_admin/pal-e-docs-app:

  1. Fix 2 TS errors (implicit any on event params in board page)
  2. Fix 6 ESLint errors (unused vars, SvelteSet, unused svelte-ignore)
  3. Push → CI passes → image built → Harbor push → ArgoCD sync → pod starts

No changes needed in pal-e-platform, pal-e-services, or pal-e-deployments.

Dispatching fix agent.

## Investigation Complete **Root cause:** NOT an infrastructure problem. The pal-e-docs-app CI pipeline has never successfully built an image. All 110 pipelines failed at `check`/`lint` before reaching the Kaniko build step. ### Findings - Harbor project `pal-e-docs-app` exists but has **zero images** - Namespace, harbor-creds secret, kustomize overlay all correctly configured - The image tag in kustomization.yaml (`e23a1d8c...`) was set manually — no image was ever pushed for it - Pipeline #110 failures: 2 TypeScript errors + 6 ESLint errors ### Fix Path Single repo fix in `forgejo_admin/pal-e-docs-app`: 1. Fix 2 TS errors (implicit `any` on event params in board page) 2. Fix 6 ESLint errors (unused vars, SvelteSet, unused svelte-ignore) 3. Push → CI passes → image built → Harbor push → ArgoCD sync → pod starts No changes needed in pal-e-platform, pal-e-services, or pal-e-deployments. Dispatching fix agent.
Author
Owner

Update: Harbor 401 resolved

Root cause: Stale Woodpecker CI secrets. Robot account robot$pal-e-docs-app+pal-e-docs-app-ci was created by terraform but Woodpecker secrets contained old credentials.

Fix applied:

  • Updated harbor_username and harbor_password secrets on pal-e-docs-app repo in Woodpecker
  • Updated event filters to ["push", "pull_request", "tag", "manual", "cron"]

Status: Pipeline triggered to validate. Waiting for build-and-push step to confirm Harbor auth works.

Systemic gap discovered: Woodpecker secrets are manual (SERVICE_ONBOARDING.md Step 4) while robot accounts are terraform-managed. Any tofu apply that recreates robots stales all downstream Woodpecker secrets. Tracked separately.

## Update: Harbor 401 resolved **Root cause:** Stale Woodpecker CI secrets. Robot account `robot$pal-e-docs-app+pal-e-docs-app-ci` was created by terraform but Woodpecker secrets contained old credentials. **Fix applied:** - Updated `harbor_username` and `harbor_password` secrets on pal-e-docs-app repo in Woodpecker - Updated event filters to `["push", "pull_request", "tag", "manual", "cron"]` **Status:** Pipeline triggered to validate. Waiting for build-and-push step to confirm Harbor auth works. **Systemic gap discovered:** Woodpecker secrets are manual (SERVICE_ONBOARDING.md Step 4) while robot accounts are terraform-managed. Any `tofu apply` that recreates robots stales all downstream Woodpecker secrets. Tracked separately.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
forgejo_admin/pal-e-platform#234
No description provided.