Fix: Kaniko HTTPS probe timeout — add insecure-registry to internal Harbor pipelines #193

Closed
opened 2026-03-27 02:23:59 +00:00 by forgejo_admin · 6 comments

Type

Bug

Lineage

Discovered during incident #184 investigation. Second root cause — agent routing was fix 1 (#191, merged).

Repo

Cross-repo: 6 repos use internal Harbor with Kaniko.
Primary: forgejo_admin/pal-e-platform (convention ownership).

What Broke

Kaniko build-and-push steps fail or suffer 90-second timeouts against Harbor's HTTP-only internal service. Two separate Kaniko code paths cause this:

  1. Push permission check — probes HTTPS on port 443 (90s TCP timeout, no listener), then HTTP on port 80. This check uses its own transport that ignores --insecure-registry. Fix: --skip-push-permission-check.
  2. Actual push — also defaults to HTTPS first without --insecure-registry. Fix: --insecure + --insecure-registry.

Both fixes are required. insecure-registry alone (shipped in the first round of 6 PRs) does NOT work because the permission check still probes HTTPS.

Validated Fix

Tested in a real Kaniko pod in the woodpecker namespace (2026-03-27):

  • --skip-push-permission-check — bypasses HTTPS permission probe entirely
  • --insecure — accept HTTP for push
  • --insecure-registry=harbor.harbor.svc.cluster.local — use HTTP directly for push/pull

Result: Kaniko pushed to http://harbor.harbor.svc.cluster.local/v2/... immediately. No 443 probe. Exit code 0.

Repro Steps

  1. Push to any repo using registry: harbor.harbor.svc.cluster.local with insecure: true
  2. Observe: Kaniko logs show 90s delay at "checking push permissions" (HTTPS probe on 443)
  3. After timeout, falls back to HTTP on 80 — may succeed or fail depending on timing

Expected Behavior

Kaniko should connect to Harbor via HTTP immediately with no HTTPS probe or permission check delay.

Environment

  • Cluster/namespace: woodpecker (CI pipeline pods)
  • Kaniko version: woodpeckerci/plugin-kaniko:2.3.0
  • Harbor service: harbor.harbor.svc.cluster.local:80 (no port 443)
  • Related alerts: none

Acceptance Criteria

  • All 6 internal-registry repos have extra_opts: "--skip-push-permission-check" in .woodpecker.yaml
  • build-and-push step completes without HTTPS probe or permission check timeout
  • No regression on 3 external-registry repos (they don't change)
  • service-onboarding-sop CI registry section updated to include extra_opts guidance

File Targets (cross-repo)

Repos to modify (add extra_opts setting to build-and-push step):

  • basketball-api/.woodpecker.yaml
  • pal-e-docs/.woodpecker.yaml
  • pal-e-app/.woodpecker.yaml
  • westside-app/.woodpecker.yaml
  • westside-contracts/.woodpecker.yaml
  • pal-e-mail/.woodpecker.yaml

Repos NOT to touch (use external FQDN with TLS):

  • mcd-tracker-api/.woodpecker.yaml
  • mcd-tracker-app/.woodpecker.yaml
  • minio-api/.woodpecker.yaml

Change per repo

# Before (current — insecure-registry merged but insufficient)
settings:
  registry: harbor.harbor.svc.cluster.local
  insecure: true
  insecure-registry: harbor.harbor.svc.cluster.local

# After (validated fix)
settings:
  registry: harbor.harbor.svc.cluster.local
  insecure: true
  insecure-registry: harbor.harbor.svc.cluster.local
  extra_opts: "--skip-push-permission-check"
  • #184 — parent incident (in_progress, this is fix 2)
  • #191 — fix 1: agent routing (merged)
  • project-pal-e-platform
### Type Bug ### Lineage Discovered during incident #184 investigation. Second root cause — agent routing was fix 1 (#191, merged). ### Repo Cross-repo: 6 repos use internal Harbor with Kaniko. Primary: `forgejo_admin/pal-e-platform` (convention ownership). ### What Broke Kaniko `build-and-push` steps fail or suffer 90-second timeouts against Harbor's HTTP-only internal service. **Two separate Kaniko code paths** cause this: 1. **Push permission check** — probes HTTPS on port 443 (90s TCP timeout, no listener), then HTTP on port 80. This check uses its own transport that **ignores** `--insecure-registry`. Fix: `--skip-push-permission-check`. 2. **Actual push** — also defaults to HTTPS first without `--insecure-registry`. Fix: `--insecure` + `--insecure-registry`. Both fixes are required. `insecure-registry` alone (shipped in the first round of 6 PRs) does NOT work because the permission check still probes HTTPS. ### Validated Fix Tested in a real Kaniko pod in the woodpecker namespace (2026-03-27): - `--skip-push-permission-check` — bypasses HTTPS permission probe entirely - `--insecure` — accept HTTP for push - `--insecure-registry=harbor.harbor.svc.cluster.local` — use HTTP directly for push/pull Result: Kaniko pushed to `http://harbor.harbor.svc.cluster.local/v2/...` immediately. No 443 probe. Exit code 0. ### Repro Steps 1. Push to any repo using `registry: harbor.harbor.svc.cluster.local` with `insecure: true` 2. Observe: Kaniko logs show 90s delay at "checking push permissions" (HTTPS probe on 443) 3. After timeout, falls back to HTTP on 80 — may succeed or fail depending on timing ### Expected Behavior Kaniko should connect to Harbor via HTTP immediately with no HTTPS probe or permission check delay. ### Environment - Cluster/namespace: woodpecker (CI pipeline pods) - Kaniko version: `woodpeckerci/plugin-kaniko:2.3.0` - Harbor service: `harbor.harbor.svc.cluster.local:80` (no port 443) - Related alerts: none ### Acceptance Criteria - [ ] All 6 internal-registry repos have `extra_opts: "--skip-push-permission-check"` in `.woodpecker.yaml` - [ ] `build-and-push` step completes without HTTPS probe or permission check timeout - [ ] No regression on 3 external-registry repos (they don't change) - [ ] `service-onboarding-sop` CI registry section updated to include `extra_opts` guidance ### File Targets (cross-repo) Repos to modify (add `extra_opts` setting to build-and-push step): - `basketball-api/.woodpecker.yaml` - `pal-e-docs/.woodpecker.yaml` - `pal-e-app/.woodpecker.yaml` - `westside-app/.woodpecker.yaml` - `westside-contracts/.woodpecker.yaml` - `pal-e-mail/.woodpecker.yaml` Repos NOT to touch (use external FQDN with TLS): - `mcd-tracker-api/.woodpecker.yaml` - `mcd-tracker-app/.woodpecker.yaml` - `minio-api/.woodpecker.yaml` ### Change per repo ```yaml # Before (current — insecure-registry merged but insufficient) settings: registry: harbor.harbor.svc.cluster.local insecure: true insecure-registry: harbor.harbor.svc.cluster.local # After (validated fix) settings: registry: harbor.harbor.svc.cluster.local insecure: true insecure-registry: harbor.harbor.svc.cluster.local extra_opts: "--skip-push-permission-check" ``` ### Related - #184 — parent incident (in_progress, this is fix 2) - #191 — fix 1: agent routing (merged) - `project-pal-e-platform`
Author
Owner

Scope Review: NEEDS_REFINEMENT

Review note: review-428-2026-03-26

Ticket is well-scoped with one material gap: westside-contracts/.woodpecker.yaml is a 6th affected repo using harbor.harbor.svc.cluster.local + insecure: true (identical pattern) but is missing from File Targets and acceptance criteria.

  • Add westside-contracts/.woodpecker.yaml to File Targets; update "5 repos" to "6 repos" throughout
  • Clarify convention note AC — service-onboarding-sop already documents internal vs external Harbor patterns; reference it or specify what's additionally needed
  • (Discovered scope) SOP hostname mismatch: SOP says harbor-core.harbor.svc.cluster.local, actual repos use harbor.harbor.svc.cluster.local — track separately
## Scope Review: NEEDS_REFINEMENT Review note: `review-428-2026-03-26` Ticket is well-scoped with one material gap: **`westside-contracts/.woodpecker.yaml`** is a 6th affected repo using `harbor.harbor.svc.cluster.local` + `insecure: true` (identical pattern) but is missing from File Targets and acceptance criteria. - [ ] Add `westside-contracts/.woodpecker.yaml` to File Targets; update "5 repos" to "6 repos" throughout - [ ] Clarify convention note AC — `service-onboarding-sop` already documents internal vs external Harbor patterns; reference it or specify what's additionally needed - (Discovered scope) SOP hostname mismatch: SOP says `harbor-core.harbor.svc.cluster.local`, actual repos use `harbor.harbor.svc.cluster.local` — track separately
Author
Owner

Refinement Update

Per review (review-428-2026-03-26):

Fix 1: Missing repo

westside-contracts/.woodpecker.yaml also uses harbor.harbor.svc.cluster.local with insecure: true. 6 affected repos, not 5:

  • basketball-api
  • pal-e-docs
  • pal-e-app
  • westside-app
  • westside-contracts ← added
  • pal-e-mail

Fix 2: Convention note AC

Removed ambiguous "convention note" AC. The service-onboarding-sop already has a CI registry URL validation check — updating that SOP's registry guidance (add insecure-registry to the template) is sufficient.

Updated Acceptance Criteria

  • All 6 internal-registry repos have insecure-registry: harbor.harbor.svc.cluster.local
  • build-and-push step completes without 90s HTTPS probe delay
  • No regression on 3 external-registry repos
  • service-onboarding-sop CI registry section updated to include insecure-registry guidance

Discovered scope (tracked separately)

SOP hostname mismatch: service-onboarding-sop says harbor-core.harbor.svc.cluster.local but all repos use harbor.harbor.svc.cluster.local.

## Refinement Update Per review (`review-428-2026-03-26`): ### Fix 1: Missing repo `westside-contracts/.woodpecker.yaml` also uses `harbor.harbor.svc.cluster.local` with `insecure: true`. **6 affected repos**, not 5: - basketball-api - pal-e-docs - pal-e-app - westside-app - westside-contracts ← **added** - pal-e-mail ### Fix 2: Convention note AC Removed ambiguous "convention note" AC. The `service-onboarding-sop` already has a CI registry URL validation check — updating that SOP's registry guidance (add `insecure-registry` to the template) is sufficient. ### Updated Acceptance Criteria - [ ] All **6** internal-registry repos have `insecure-registry: harbor.harbor.svc.cluster.local` - [ ] `build-and-push` step completes without 90s HTTPS probe delay - [ ] No regression on 3 external-registry repos - [ ] `service-onboarding-sop` CI registry section updated to include `insecure-registry` guidance ### Discovered scope (tracked separately) SOP hostname mismatch: `service-onboarding-sop` says `harbor-core.harbor.svc.cluster.local` but all repos use `harbor.harbor.svc.cluster.local`.
Author
Owner

Scope Review: READY

Review note: review-428-2026-03-26 (updated)

Re-reviewed after refinement comment. Both concerns resolved:

  1. westside-contracts — added as 6th file target, repo count corrected to 6 throughout. Verified westside-contracts/.woodpecker.yaml line 31 still has registry: harbor.harbor.svc.cluster.local + insecure: true, no insecure-registry.
  2. Convention note AC — replaced with concrete "service-onboarding-sop CI registry section updated to include insecure-registry guidance." References existing SOP, testable.

Discovered scope (SOP hostname mismatch) acknowledged for separate tracking.

VERDICT: READY — ticket can move to next_up.

## Scope Review: READY Review note: `review-428-2026-03-26` (updated) Re-reviewed after refinement comment. Both concerns resolved: 1. **westside-contracts** — added as 6th file target, repo count corrected to 6 throughout. Verified `westside-contracts/.woodpecker.yaml` line 31 still has `registry: harbor.harbor.svc.cluster.local` + `insecure: true`, no `insecure-registry`. 2. **Convention note AC** — replaced with concrete "`service-onboarding-sop` CI registry section updated to include `insecure-registry` guidance." References existing SOP, testable. Discovered scope (SOP hostname mismatch) acknowledged for separate tracking. **VERDICT: READY** — ticket can move to `next_up`.
Author
Owner

Fix Update — insecure-registry alone is insufficient

Finding

The 6 PRs merged for insecure-registry don't fix the problem. The push permission check is a separate Kaniko code path that doesn't respect --insecure-registry. It still probes HTTPS on 443 (90s timeout) then HTTP on 80.

Validated fix

Tested in a real Kaniko pod in the woodpecker namespace:

--skip-push-permission-check    # bypasses HTTPS permission probe entirely
--insecure                      # accept HTTP for push
--insecure-registry=<host>      # use HTTP directly for push/pull

Result: Kaniko pushed to http://harbor.harbor.svc.cluster.local/v2/... immediately. No 443 probe. Exit code 0.

Updated pipeline config needed (all 6 repos)

settings:
  registry: harbor.harbor.svc.cluster.local
  insecure: true
  insecure-registry: harbor.harbor.svc.cluster.local
  extra_opts: "--skip-push-permission-check"

Updated acceptance criteria

  • All 6 internal-registry repos have extra_opts: "--skip-push-permission-check"
  • build-and-push step completes without HTTPS probe or permission check timeout
  • Validated in real Kaniko pod before shipping (done ✓)
## Fix Update — insecure-registry alone is insufficient ### Finding The 6 PRs merged for `insecure-registry` don't fix the problem. The push permission check is a separate Kaniko code path that doesn't respect `--insecure-registry`. It still probes HTTPS on 443 (90s timeout) then HTTP on 80. ### Validated fix Tested in a real Kaniko pod in the woodpecker namespace: ``` --skip-push-permission-check # bypasses HTTPS permission probe entirely --insecure # accept HTTP for push --insecure-registry=<host> # use HTTP directly for push/pull ``` Result: Kaniko pushed to `http://harbor.harbor.svc.cluster.local/v2/...` immediately. No 443 probe. Exit code 0. ### Updated pipeline config needed (all 6 repos) ```yaml settings: registry: harbor.harbor.svc.cluster.local insecure: true insecure-registry: harbor.harbor.svc.cluster.local extra_opts: "--skip-push-permission-check" ``` ### Updated acceptance criteria - [ ] All 6 internal-registry repos have `extra_opts: "--skip-push-permission-check"` - [ ] `build-and-push` step completes without HTTPS probe or permission check timeout - [ ] Validated in real Kaniko pod before shipping (done ✓)
Author
Owner

Scope Review: NEEDS_REFINEMENT

Review note: review-428-2026-03-26 (updated)

Re-reviewed after Fix Update comment. The issue body is stale — it still documents insecure-registry as the complete fix, but that approach failed. The validated fix (extra_opts: "--skip-push-permission-check") exists only in a comment, not the body. Agents read the body, not comments.

Three blockers before READY:

  1. Reopen this issue — closed prematurely; the 6 merged insecure-registry PRs do not resolve the push permission check timeout. Verified: all 6 repos have insecure-registry but zero have extra_opts.

  2. Update issue body — incorporate Fix Update comment into body sections:

    • "What Broke": note the two code paths (pull = insecure-registry, push = --skip-push-permission-check)
    • "Change per repo": update the After block to show all 4 settings (registry + insecure + insecure-registry + extra_opts)
    • "Acceptance Criteria": replace current AC with the corrected AC from the Fix Update comment
  3. Update SOP ACservice-onboarding-sop CI registry template must include extra_opts guidance alongside insecure-registry

File target verification (all 6 repos grep-confirmed):

  • basketball-api/.woodpecker.yaml line 46 — has insecure-registry, missing extra_opts
  • pal-e-docs/.woodpecker.yaml line 71 — has insecure-registry, missing extra_opts
  • pal-e-app/.woodpecker.yaml line 63 — has insecure-registry, missing extra_opts
  • westside-app/.woodpecker.yaml line 36 — has insecure-registry, missing extra_opts
  • westside-contracts/.woodpecker.yaml line 33 — has insecure-registry, missing extra_opts
  • pal-e-mail/.woodpecker.yaml line 44 — has insecure-registry, missing extra_opts

3 external-registry repos (mcd-tracker-api, mcd-tracker-app, minio-api) confirmed unaffected — use harbor.tail5b443a.ts.net with TLS.

VERDICT: NEEDS_REFINEMENT — issue body must be updated to match the validated fix before an agent can execute it.

## Scope Review: NEEDS_REFINEMENT Review note: `review-428-2026-03-26` (updated) Re-reviewed after Fix Update comment. The issue body is **stale** — it still documents `insecure-registry` as the complete fix, but that approach failed. The validated fix (`extra_opts: "--skip-push-permission-check"`) exists only in a comment, not the body. Agents read the body, not comments. **Three blockers before READY:** 1. **Reopen this issue** — closed prematurely; the 6 merged `insecure-registry` PRs do not resolve the push permission check timeout. Verified: all 6 repos have `insecure-registry` but zero have `extra_opts`. 2. **Update issue body** — incorporate Fix Update comment into body sections: - "What Broke": note the two code paths (pull = `insecure-registry`, push = `--skip-push-permission-check`) - "Change per repo": update the After block to show all 4 settings (registry + insecure + insecure-registry + extra_opts) - "Acceptance Criteria": replace current AC with the corrected AC from the Fix Update comment 3. **Update SOP AC** — `service-onboarding-sop` CI registry template must include `extra_opts` guidance alongside `insecure-registry` **File target verification** (all 6 repos grep-confirmed): - `basketball-api/.woodpecker.yaml` line 46 — has `insecure-registry`, missing `extra_opts` - `pal-e-docs/.woodpecker.yaml` line 71 — has `insecure-registry`, missing `extra_opts` - `pal-e-app/.woodpecker.yaml` line 63 — has `insecure-registry`, missing `extra_opts` - `westside-app/.woodpecker.yaml` line 36 — has `insecure-registry`, missing `extra_opts` - `westside-contracts/.woodpecker.yaml` line 33 — has `insecure-registry`, missing `extra_opts` - `pal-e-mail/.woodpecker.yaml` line 44 — has `insecure-registry`, missing `extra_opts` 3 external-registry repos (mcd-tracker-api, mcd-tracker-app, minio-api) confirmed unaffected — use `harbor.tail5b443a.ts.net` with TLS. **VERDICT: NEEDS_REFINEMENT** — issue body must be updated to match the validated fix before an agent can execute it.
Author
Owner

Scope Review: READY

Review note: review-428-2026-03-26 (updated)

All three prior NEEDS_REFINEMENT actions resolved: issue reopened, body rewritten with validated fix (both Kaniko code paths documented, correct before/after diff with extra_opts), SOP update AC present. All 6 internal-registry file targets verified in filesystem — each has insecure-registry merged but missing extra_opts. 3 external-registry repos confirmed safe (TLS FQDN). All 4 acceptance criteria are agent-testable.

VERDICT: READY — ticket is cleared for execution.

Note: service-onboarding-sop CI registry row uses harbor-core.harbor.svc.cluster.local but repos use harbor.harbor.svc.cluster.local — pre-existing hostname mismatch, out of scope for this ticket.

## Scope Review: READY Review note: `review-428-2026-03-26` (updated) All three prior NEEDS_REFINEMENT actions resolved: issue reopened, body rewritten with validated fix (both Kaniko code paths documented, correct before/after diff with `extra_opts`), SOP update AC present. All 6 internal-registry file targets verified in filesystem — each has `insecure-registry` merged but missing `extra_opts`. 3 external-registry repos confirmed safe (TLS FQDN). All 4 acceptance criteria are agent-testable. **VERDICT: READY** — ticket is cleared for execution. Note: `service-onboarding-sop` CI registry row uses `harbor-core.harbor.svc.cluster.local` but repos use `harbor.harbor.svc.cluster.local` — pre-existing hostname mismatch, out of scope for this ticket.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
forgejo_admin/pal-e-platform#193
No description provided.