Remove build_args regression so Kaniko pulls via Tailscale FQDN

ldraney commented

2026-06-04 12:03:39 +00:00

Owner

Summary

Removes build_args: "REGISTRY=harbor.harbor.svc.cluster.local" accidentally re-introduced by PR #79's conflict resolution
Removes --insecure-pull flag (no longer needed — pulls now use Tailscale FQDN with proper HTTPS)
Unblocks 6 consecutive failed pipelines (#184–#194) and the production deploy of the 500 fix (PR #75)

Changes

.woodpecker.yaml: removed build_args line and --insecure-pull from extra_opts

Test Plan

Pipeline build-and-push step succeeds
Image appears in Harbor at landscaping-assistant/app:{sha}
Deploy to production and verify 500 fix is live

Review Checklist

Passed automated review-fix loop
No secrets committed
No unnecessary file changes
Commit messages are descriptive

Closes #82 — Kaniko/Harbor intermittent connectivity
ldraney/landscaping-assistant #77 — original build-arg regression

Investigation

6 consecutive pipeline failures all fail at build-and-push:

https://harbor.harbor.svc.cluster.local:443 → i/o timeout (no HTTPS listener)
http://harbor.harbor.svc.cluster.local:80  → connection refused

Standalone connectivity from the woodpecker namespace works fine (verified with test pods using both alpine and the Kaniko image). The issue is Kaniko's HTTPS-first→HTTP-fallback behavior with --insecure-pull — the HTTPS timeout on port 443 (~30s) corrupts the subsequent HTTP attempt.

Fix: let Kaniko pull base images via Tailscale FQDN (proper HTTPS, no fallback needed). Push remains cluster-internal with insecure: true (forces HTTP directly).

Path	Pull (base images)	Push (built image)
Before	cluster-internal HTTP (broken)	cluster-internal HTTP
After	Tailscale FQDN HTTPS (working)	cluster-internal HTTP (unchanged)

## Summary - Removes `build_args: "REGISTRY=harbor.harbor.svc.cluster.local"` accidentally re-introduced by PR #79's conflict resolution - Removes `--insecure-pull` flag (no longer needed — pulls now use Tailscale FQDN with proper HTTPS) - Unblocks 6 consecutive failed pipelines (#184–#194) and the production deploy of the 500 fix (PR #75) ## Changes - `.woodpecker.yaml`: removed `build_args` line and `--insecure-pull` from `extra_opts` ## Test Plan - [ ] Pipeline build-and-push step succeeds - [ ] Image appears in Harbor at `landscaping-assistant/app:{sha}` - [ ] Deploy to production and verify 500 fix is live ## Review Checklist - [ ] Passed automated review-fix loop - [ ] No secrets committed - [ ] No unnecessary file changes - [ ] Commit messages are descriptive ## Related Notes - Closes #82 — Kaniko/Harbor intermittent connectivity - `ldraney/landscaping-assistant #77` — original build-arg regression ## Investigation 6 consecutive pipeline failures all fail at `build-and-push`: ``` https://harbor.harbor.svc.cluster.local:443 → i/o timeout (no HTTPS listener) http://harbor.harbor.svc.cluster.local:80 → connection refused ``` Standalone connectivity from the woodpecker namespace works fine (verified with test pods using both alpine and the Kaniko image). The issue is Kaniko's HTTPS-first→HTTP-fallback behavior with `--insecure-pull` — the HTTPS timeout on port 443 (~30s) corrupts the subsequent HTTP attempt. Fix: let Kaniko pull base images via Tailscale FQDN (proper HTTPS, no fallback needed). Push remains cluster-internal with `insecure: true` (forces HTTP directly). | Path | Pull (base images) | Push (built image) | |------|---|---| | Before | cluster-internal HTTP (broken) | cluster-internal HTTP | | After | Tailscale FQDN HTTPS (working) | cluster-internal HTTP (unchanged) |

ldraney added 1 commit

2026-06-04 12:03:39 +00:00

Remove build_args regression so Kaniko pulls via Tailscale FQDN

ci/woodpecker/push/woodpecker Pipeline failed

Details

ci/woodpecker/pr/woodpecker Pipeline failed

Details

CI / scan_ruby (pull_request) Has been cancelled

Details

CI / scan_js (pull_request) Has been cancelled

Details

CI / lint (pull_request) Has been cancelled

Details

7cfa57881b

PR #79's conflict resolution re-introduced the build_args override
that PR #78 explicitly removed. This forced Kaniko to pull base
images via cluster-internal Harbor (harbor.harbor.svc.cluster.local),
which fails due to Kaniko's HTTPS-first fallback behavior — port 443
times out (no HTTPS listener), then HTTP on port 80 gets connection
refused despite standalone connectivity working fine.

Without build_args, the Dockerfile default REGISTRY
(harbor.tail5b443a.ts.net) is used for pulls — the same Tailscale
FQDN that lint and test steps already use successfully. Push remains
cluster-internal via insecure: true (which forces HTTP directly,
bypassing the broken HTTPS→HTTP fallback).

Also removes --insecure-pull since pulls now go to Tailscale FQDN
with proper HTTPS — no HTTP fallback needed.

Closes #82

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

ldraney commented

2026-06-04 12:05:04 +00:00

Author

Owner

PR #91 Review

DOMAIN REVIEW

Domain: CI/CD -- Woodpecker pipeline config, Kaniko image build/push.

Change analysis (2 deletions, 1 addition in .woodpecker.yaml):

Removed build_args: "REGISTRY=harbor.harbor.svc.cluster.local" -- This was overriding the Dockerfile's default ARG REGISTRY=harbor.tail5b443a.ts.net (line 4 of Dockerfile), forcing Kaniko to pull base images via the cluster-internal address. That address has no HTTPS listener on :443, and Kaniko's HTTPS-first fallback behavior caused i/o timeouts. Removing the override lets the Dockerfile default (Tailscale FQDN with proper HTTPS) take effect. Correct fix.
Removed --insecure-pull from extra_opts -- No longer needed since pulls now go through Tailscale FQDN with valid HTTPS certificates. Consistent with the build_args removal.
Push path unchanged -- registry: harbor.harbor.svc.cluster.local, insecure: true, and insecure-registry: harbor.harbor.svc.cluster.local all remain, correctly keeping pushes cluster-internal via HTTP.

Verified against Dockerfile: Both FROM stages use ${REGISTRY}/library/ruby-rails-build:latest and ${REGISTRY}/library/ruby-rails-runtime:latest. The default REGISTRY=harbor.tail5b443a.ts.net is declared via ARG at lines 4 and 19. The fix is mechanically sound.

Historical context confirmed: PR #78 originally removed this build_args line. PR #79 re-introduced it during conflict resolution. This PR reverts to the #78 state, which is the correct one. The investigation in the PR body (6 consecutive pipeline failures, all at build-and-push) corroborates the diagnosis.

BLOCKERS

None.

No application code changed -- no test coverage blocker applies.
No secrets committed (Harbor credentials properly use from_secret).
No user input handling involved.
No DRY/auth concerns.

NITS

Branch naming: Branch is fix/remove-build-args-regression but SOP convention is {issue-number}-{kebab-case-purpose} (e.g., 82-remove-build-args-regression). Non-blocking since the branch is already pushed.
No plan slug in Related: The Related section references #82 and #77 but does not reference a pal-e-docs plan slug. Minor process gap if a plan exists for this work.
insecure-registry setting: The insecure-registry: harbor.harbor.svc.cluster.local line remains. This is correct for push, but worth noting that this setting is only meaningful for push operations now that pulls go through a different path entirely. No action needed -- just a clarity observation.

SOP COMPLIANCE

Branch named after issue (fix/remove-build-args-regression vs expected 82-*)
PR body follows template (Summary, Changes, Test Plan, Related, plus Investigation)
Related references plan slug (references issues only)
No secrets committed (Harbor creds use from_secret)
No unnecessary file changes (single file, 2 lines removed, 1 modified)
Commit messages are descriptive

PROCESS OBSERVATIONS

This is a regression fix for a regression fix (PR #76 added build_args, #78 removed it, #79 re-introduced it, now #91 removes it again). The root cause is conflict resolution during merge of #79. Consider squash-merging or at minimum reviewing conflict resolutions more carefully to prevent this oscillation pattern.
6 blocked pipelines is a significant deployment frequency impact. This is a P0 fix that unblocks the production deploy of the 500 fix (PR #75). Time-to-restore is the key DORA metric here.
The test plan (pipeline succeeds, image in Harbor, production deploy) is appropriate for a CI config change. No unit/integration tests apply.

VERDICT: APPROVED

Clean, minimal, well-investigated fix. The change is mechanically correct: removing the build_args override restores the Dockerfile default (Tailscale FQDN with HTTPS), and removing --insecure-pull is the consistent follow-through. Branch naming is a nit, not a blocker. Ship it to unblock the pipeline.

## PR #91 Review ### DOMAIN REVIEW **Domain**: CI/CD -- Woodpecker pipeline config, Kaniko image build/push. **Change analysis** (2 deletions, 1 addition in `.woodpecker.yaml`): 1. **Removed `build_args: "REGISTRY=harbor.harbor.svc.cluster.local"`** -- This was overriding the Dockerfile's default `ARG REGISTRY=harbor.tail5b443a.ts.net` (line 4 of `Dockerfile`), forcing Kaniko to pull base images via the cluster-internal address. That address has no HTTPS listener on :443, and Kaniko's HTTPS-first fallback behavior caused i/o timeouts. Removing the override lets the Dockerfile default (Tailscale FQDN with proper HTTPS) take effect. Correct fix. 2. **Removed `--insecure-pull` from `extra_opts`** -- No longer needed since pulls now go through Tailscale FQDN with valid HTTPS certificates. Consistent with the `build_args` removal. 3. **Push path unchanged** -- `registry: harbor.harbor.svc.cluster.local`, `insecure: true`, and `insecure-registry: harbor.harbor.svc.cluster.local` all remain, correctly keeping pushes cluster-internal via HTTP. **Verified against Dockerfile**: Both `FROM` stages use `${REGISTRY}/library/ruby-rails-build:latest` and `${REGISTRY}/library/ruby-rails-runtime:latest`. The default `REGISTRY=harbor.tail5b443a.ts.net` is declared via `ARG` at lines 4 and 19. The fix is mechanically sound. **Historical context confirmed**: PR #78 originally removed this `build_args` line. PR #79 re-introduced it during conflict resolution. This PR reverts to the #78 state, which is the correct one. The investigation in the PR body (6 consecutive pipeline failures, all at `build-and-push`) corroborates the diagnosis. ### BLOCKERS None. - No application code changed -- no test coverage blocker applies. - No secrets committed (Harbor credentials properly use `from_secret`). - No user input handling involved. - No DRY/auth concerns. ### NITS 1. **Branch naming**: Branch is `fix/remove-build-args-regression` but SOP convention is `{issue-number}-{kebab-case-purpose}` (e.g., `82-remove-build-args-regression`). Non-blocking since the branch is already pushed. 2. **No plan slug in Related**: The Related section references `#82` and `#77` but does not reference a pal-e-docs plan slug. Minor process gap if a plan exists for this work. 3. **`insecure-registry` setting**: The `insecure-registry: harbor.harbor.svc.cluster.local` line remains. This is correct for push, but worth noting that this setting is only meaningful for push operations now that pulls go through a different path entirely. No action needed -- just a clarity observation. ### SOP COMPLIANCE - [ ] Branch named after issue (`fix/remove-build-args-regression` vs expected `82-*`) - [x] PR body follows template (Summary, Changes, Test Plan, Related, plus Investigation) - [ ] Related references plan slug (references issues only) - [x] No secrets committed (Harbor creds use `from_secret`) - [x] No unnecessary file changes (single file, 2 lines removed, 1 modified) - [x] Commit messages are descriptive ### PROCESS OBSERVATIONS - This is a **regression fix for a regression fix** (PR #76 added `build_args`, #78 removed it, #79 re-introduced it, now #91 removes it again). The root cause is conflict resolution during merge of #79. Consider squash-merging or at minimum reviewing conflict resolutions more carefully to prevent this oscillation pattern. - **6 blocked pipelines** is a significant deployment frequency impact. This is a P0 fix that unblocks the production deploy of the 500 fix (PR #75). Time-to-restore is the key DORA metric here. - The test plan (pipeline succeeds, image in Harbor, production deploy) is appropriate for a CI config change. No unit/integration tests apply. ### VERDICT: APPROVED Clean, minimal, well-investigated fix. The change is mechanically correct: removing the `build_args` override restores the Dockerfile default (Tailscale FQDN with HTTPS), and removing `--insecure-pull` is the consistent follow-through. Branch naming is a nit, not a blocker. Ship it to unblock the pipeline.