fix: replace plugin-git clone with alpine/git internal URL #134

Merged
forgejo_admin merged 18 commits from 133-fix-ci-clone-alpine-git into main 2026-03-21 23:44:18 +00:00

Summary

  • Replace default Woodpecker clone with alpine/git + raw commands: using internal Forgejo service URL — matches working westside-app pattern
  • Add sleep 2 before clone for kube-router ipset sync (NetworkPolicy race condition)
  • Add -lock=false to both apply paths (initial + retry) for consistency with plan step

Changes

  • .woodpecker.yaml: Add clone: block with alpine/git, internal URL (forgejo-http.forgejo.svc.cluster.local), and sleep 2
  • .woodpecker.yaml: Add -lock=false to initial tofu apply (line 145)
  • .woodpecker.yaml: Add -lock=false to retry tofu apply (line 154)

No Terraform changes — tofu fmt and tofu validate are unaffected.

Test Plan

  • PR triggers pipeline automatically (this PR is the test)
  • Clone step succeeds via internal URL (no TLS error)
  • Validate step runs (tofu fmt -check + tofu validate)
  • Plan step runs and posts tofu plan comment to this PR
  • After merge: push-to-main triggers apply with -lock=false
  • After merge: cross-pillar-review step fires

Review Checklist

  • No secrets committed
  • No unnecessary file changes
  • Commit messages are descriptive
  • Clone pattern matches westside-app reference exactly
  • Closes #133 — CI pipeline broken, external TLS URL 66% failure rate
  • plan-pal-e-platform — Platform Hardening, CI reliability
  • Supersedes approach from PRs #118 (plugin-git), #128 (CI_NETRC_MACHINE)
  • Related to #127 (kube-router ipset sync — sleep 2 workaround)
## Summary - Replace default Woodpecker clone with `alpine/git` + raw `commands:` using internal Forgejo service URL — matches working westside-app pattern - Add `sleep 2` before clone for kube-router ipset sync (NetworkPolicy race condition) - Add `-lock=false` to both apply paths (initial + retry) for consistency with plan step ## Changes - `.woodpecker.yaml`: Add `clone:` block with `alpine/git`, internal URL (`forgejo-http.forgejo.svc.cluster.local`), and `sleep 2` - `.woodpecker.yaml`: Add `-lock=false` to initial `tofu apply` (line 145) - `.woodpecker.yaml`: Add `-lock=false` to retry `tofu apply` (line 154) No Terraform changes — `tofu fmt` and `tofu validate` are unaffected. ## Test Plan - [ ] PR triggers pipeline automatically (this PR is the test) - [ ] Clone step succeeds via internal URL (no TLS error) - [ ] Validate step runs (`tofu fmt -check` + `tofu validate`) - [ ] Plan step runs and posts tofu plan comment to this PR - [ ] After merge: push-to-main triggers apply with `-lock=false` - [ ] After merge: cross-pillar-review step fires ## Review Checklist - [ ] No secrets committed - [ ] No unnecessary file changes - [ ] Commit messages are descriptive - [ ] Clone pattern matches westside-app reference exactly ## Related - Closes #133 — CI pipeline broken, external TLS URL 66% failure rate - `plan-pal-e-platform` — Platform Hardening, CI reliability - Supersedes approach from PRs #118 (plugin-git), #128 (CI_NETRC_MACHINE) - Related to #127 (kube-router ipset sync — `sleep 2` workaround)
fix: replace plugin-git clone with alpine/git internal URL (#133)
Some checks failed
ci/woodpecker/push/woodpecker Pipeline failed
ae362f4317
The default clone and previous plugin-git attempts (PRs #118, #128)
fail because plugin-git injects netrc credentials that conflict with
internal service URLs. Switch to alpine/git with raw git commands,
matching the working westside-app pattern. Add sleep 2 for kube-router
ipset sync. Also add -lock=false to both apply paths for consistency
with the plan step.

Closes #133

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Author
Owner

PR #134 Review

DOMAIN REVIEW

Tech stack: Woodpecker CI pipeline configuration (YAML), OpenTofu/Terraform IaC, Kubernetes networking.

Clone block (lines 1-9):
The alpine/git clone pattern is identical to the proven westside-app reference -- same image, same internal service URL (forgejo-http.forgejo.svc.cluster.local:80), same sleep 2 workaround, same shallow clone via --depth 1. This replaces the default plugin-git clone that used the external TLS URL with a 66% failure rate. The internal HTTP URL avoids the TLS/DNS instability entirely.

The sleep 2 for kube-router ipset sync is a documented workaround for issue #127. It is the correct mitigation until the underlying kube-router issue is resolved.

-lock=false additions (lines 145, 154):
Both tofu apply invocations now include -lock=false, making the apply step consistent with the plan step (line 76) which already had this flag. This follows the feedback_tofu_lock_false.md behavioral rule: agent prompts with tofu plan/apply MUST include -lock=false because state locks block CI.

Observation on force-unlock logic: With -lock=false on the initial apply (line 145), OpenTofu bypasses lock acquisition entirely. This means the "state is already locked" error should never trigger from the initial apply attempt -- making the force-unlock + retry block (lines 148-161) effectively dead code. The logic is harmless and defensive, but logically unreachable when -lock=false is active. See nit below.

BLOCKERS

None.

  • No new application code, so test coverage is not applicable -- the pipeline run on this PR is the test.
  • No unvalidated user input (CI variables are Woodpecker-controlled).
  • No secrets or credentials in the diff (all use from_secret: references).
  • No duplicated auth/security logic.

NITS

  1. Dead code: force-unlock block (lines 148-161). With -lock=false on the initial apply, OpenTofu will not report "the state is already locked" -- it simply ignores locks. The entire if/grep/force-unlock/retry block is now unreachable. Consider either (a) removing the dead code for clarity, or (b) removing -lock=false from the initial apply and keeping the defensive logic as the fallback mechanism. Option (b) is arguably better: try with locking first, fall back to force-unlock if stale. Either way, this is non-blocking -- the current state is safe and functional.

SOP COMPLIANCE

  • Branch named after issue (133-fix-ci-clone-alpine-git references #133)
  • PR body follows template (Summary, Changes, Test Plan, Review Checklist, Related)
  • Related references plan slug (plan-pal-e-platform)
  • No secrets committed (all via from_secret:)
  • No unnecessary file changes (1 file, all changes scoped to the fix)
  • Commit message is descriptive

PROCESS OBSERVATIONS

  • Deployment frequency: This unblocks CI entirely -- a 66% clone failure rate is a direct deployment frequency killer. High-impact fix.
  • Change failure risk: Low. The clone pattern is battle-tested in westside-app. The -lock=false flag is already used in the plan step.
  • Related issues: This fix addresses #133 directly and incorporates the sleep 2 workaround from #127. It also supersedes the approaches from PRs #118 and #128, consolidating CI clone reliability into a single proven pattern.
  • Documentation: The dead force-unlock code (nit #1) could be tracked as a cleanup item if the team wants to revisit.

VERDICT: APPROVED

## PR #134 Review ### DOMAIN REVIEW **Tech stack:** Woodpecker CI pipeline configuration (YAML), OpenTofu/Terraform IaC, Kubernetes networking. **Clone block (lines 1-9):** The `alpine/git` clone pattern is identical to the proven westside-app reference -- same image, same internal service URL (`forgejo-http.forgejo.svc.cluster.local:80`), same `sleep 2` workaround, same shallow clone via `--depth 1`. This replaces the default plugin-git clone that used the external TLS URL with a 66% failure rate. The internal HTTP URL avoids the TLS/DNS instability entirely. The `sleep 2` for kube-router ipset sync is a documented workaround for issue #127. It is the correct mitigation until the underlying kube-router issue is resolved. **`-lock=false` additions (lines 145, 154):** Both `tofu apply` invocations now include `-lock=false`, making the apply step consistent with the plan step (line 76) which already had this flag. This follows the `feedback_tofu_lock_false.md` behavioral rule: agent prompts with tofu plan/apply MUST include `-lock=false` because state locks block CI. **Observation on force-unlock logic:** With `-lock=false` on the initial apply (line 145), OpenTofu bypasses lock acquisition entirely. This means the "state is already locked" error should never trigger from the initial apply attempt -- making the force-unlock + retry block (lines 148-161) effectively dead code. The logic is harmless and defensive, but logically unreachable when `-lock=false` is active. See nit below. ### BLOCKERS None. - No new application code, so test coverage is not applicable -- the pipeline run on this PR is the test. - No unvalidated user input (CI variables are Woodpecker-controlled). - No secrets or credentials in the diff (all use `from_secret:` references). - No duplicated auth/security logic. ### NITS 1. **Dead code: force-unlock block (lines 148-161).** With `-lock=false` on the initial apply, OpenTofu will not report "the state is already locked" -- it simply ignores locks. The entire `if/grep/force-unlock/retry` block is now unreachable. Consider either (a) removing the dead code for clarity, or (b) removing `-lock=false` from the initial apply and keeping the defensive logic as the fallback mechanism. Option (b) is arguably better: try with locking first, fall back to force-unlock if stale. Either way, this is non-blocking -- the current state is safe and functional. ### SOP COMPLIANCE - [x] Branch named after issue (`133-fix-ci-clone-alpine-git` references #133) - [x] PR body follows template (Summary, Changes, Test Plan, Review Checklist, Related) - [x] Related references plan slug (`plan-pal-e-platform`) - [x] No secrets committed (all via `from_secret:`) - [x] No unnecessary file changes (1 file, all changes scoped to the fix) - [x] Commit message is descriptive ### PROCESS OBSERVATIONS - **Deployment frequency:** This unblocks CI entirely -- a 66% clone failure rate is a direct deployment frequency killer. High-impact fix. - **Change failure risk:** Low. The clone pattern is battle-tested in westside-app. The `-lock=false` flag is already used in the plan step. - **Related issues:** This fix addresses #133 directly and incorporates the `sleep 2` workaround from #127. It also supersedes the approaches from PRs #118 and #128, consolidating CI clone reliability into a single proven pattern. - **Documentation:** The dead force-unlock code (nit #1) could be tracked as a cleanup item if the team wants to revisit. ### VERDICT: APPROVED
fix: add netrc auth to clone step for private repos
All checks were successful
ci/woodpecker/push/woodpecker Pipeline was successful
01b54d4980
The internal Forgejo URL works (no DNS/TLS errors) but
pal-e-platform is a private repo that requires authentication.
Add .netrc file using the existing forgejo_token secret so
git fetch can authenticate against the internal service URL.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
ci: trigger fresh PR event to test OAuth token refresh
All checks were successful
ci/woodpecker/push/woodpecker Pipeline was successful
809791df6c
Empty commit to generate a new webhook event after Woodpecker
server restart with removed WOODPECKER_EXPERT_FORGE_OAUTH_HOST.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
ci: test PR event with debug logging
All checks were successful
ci/woodpecker/push/woodpecker Pipeline was successful
ad2274be61
ci: test PR event after trusted + clone plugin settings
All checks were successful
ci/woodpecker/push/woodpecker Pipeline was successful
6ff1c5e666
ci: test PR event after secret + trusted clone fix
Some checks failed
ci/woodpecker/push/woodpecker Pipeline was successful
ci/woodpecker/pr/woodpecker Pipeline failed
3dd07b71c9
fix: force IPv4 in CI steps — cluster has no IPv6 routing
Some checks failed
ci/woodpecker/push/woodpecker Pipeline was successful
ci/woodpecker/pr/woodpecker Pipeline failed
2b7acf1662
registry.opentofu.org returns AAAA records first. CI pods have no
IPv6 route, causing "network is unreachable" on tofu init. Setting
gai.conf precedence forces IPv4 address selection.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
fix: use cgo DNS resolver + retry for CI provider downloads
Some checks failed
ci/woodpecker/push/woodpecker Pipeline was successful
ci/woodpecker/pr/woodpecker Pipeline failed
576dd5a0a0
Go's pure DNS resolver gets "server misbehaving" from CoreDNS
intermittently. GODEBUG=netdns=cgo forces Go to use musl's
getaddrinfo which handles DNS responses more gracefully. Also
adds retry with 3s delay on tofu init failures.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
fix: add retry to apk install for DNS intermittency
Some checks failed
ci/woodpecker/push/woodpecker Pipeline was successful
ci/woodpecker/pr/woodpecker Pipeline failed
47250ffef3
fix: more aggressive retry for apk DNS failures (3 attempts, 10s delay)
Some checks failed
ci/woodpecker/push/woodpecker Pipeline was successful
ci/woodpecker/pr/woodpecker Pipeline failed
54a1c8604e
fix: eliminate apk dependency — use wget for PR comment posting
Some checks failed
ci/woodpecker/push/woodpecker Pipeline was successful
ci/woodpecker/pr/woodpecker Pipeline failed
d65f7c39d1
apk add fails consistently due to DNS/TLS issues reaching Alpine
CDN from CI pods. Replaced curl+jq with wget (built into Alpine
busybox) and shell-based JSON formatting. Zero external package
downloads needed for the plan step.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
fix: disable IPv6 in CI containers via sysctl
Some checks failed
ci/woodpecker/push/woodpecker Pipeline was successful
ci/woodpecker/pr/woodpecker Pipeline failed
da36ec69f6
Go's static binary ignores GODEBUG=netdns=cgo. The only way to
prevent Go from picking IPv6 (AAAA) addresses is to disable IPv6
at the kernel level. The cluster has no IPv6 routing, so AAAA
connections to registry.opentofu.org always fail with "network is
unreachable." sysctl runs with || true in case NET_ADMIN is missing.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
fix: base64 decode kubeconfig in CI steps
Some checks failed
ci/woodpecker/push/woodpecker Pipeline was successful
ci/woodpecker/pr/woodpecker Pipeline failed
ced774bd1e
The kubeconfig secret is now stored as base64 to prevent x509 cert
corruption when the MCP tool transmits multi-line YAML. Also fixes
stale node IP (10.0.0.217 → 10.0.0.149) from the archbox move.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
fix: revert to raw kubeconfig (secret set via UI, no corruption)
Some checks failed
ci/woodpecker/push/woodpecker Pipeline was successful
ci/woodpecker/pr/woodpecker Pipeline failed
e4edb8a73a
The kubeconfig_content secret was updated via the Woodpecker UI
with the correct node IP (10.0.0.149) and uncorrupted cert data.
Reverts base64 decode — raw echo is sufficient when the secret
isn't mangled by the MCP tool.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
fix: base64 decode kubeconfig — secret set via Playwright UI
Some checks failed
ci/woodpecker/push/woodpecker Pipeline was successful
ci/woodpecker/pr/woodpecker Pipeline failed
f10de181ae
Kubeconfig stored as base64 in Woodpecker secret (set via browser
UI to avoid MCP/API corruption of multi-line YAML with embedded
certs). Pipeline decodes with base64 -d. Also fixes stale node IP
(10.0.0.217 → 10.0.0.149).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Woodpecker's secret store corrupts multi-line YAML values containing
embedded base64 cert data (x509 unmarshal failures, key mismatches).
Bypass entirely: store kubeconfig as a Kubernetes secret in the
woodpecker namespace, mount via volumes (trusted repo setting).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
fix: use SA token for k8s auth — no cert corruption possible
Some checks failed
ci/woodpecker/push/woodpecker Pipeline was successful
ci/woodpecker/pr/woodpecker Pipeline failed
2cc924409f
Replace client cert kubeconfig with service account token auth.
The SA token is a simple JWT string (no multi-line YAML, no embedded
base64 certs). Kubeconfig is constructed in-step using:
- SA token from Woodpecker secret (single-line, corruption-proof)
- CA cert from pod's service account mount
- kubernetes.default.svc.cluster.local (stable, no IP dependency)

Service account ci-admin in woodpecker namespace with cluster-admin.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
fix: use internal MinIO URL in CI, external for CLI
All checks were successful
ci/woodpecker/push/woodpecker Pipeline was successful
ci/woodpecker/pr/woodpecker Pipeline was successful
ci/woodpecker/pull_request_closed/woodpecker Pipeline was successful
716b653cc6
MinIO provider connects to minio-api.tail5b443a.ts.net which hairpins
through DERP relay IPs from inside the cluster. Made minio_server and
minio_ssl configurable variables — defaults to external (CLI), CI
overrides with internal URL via -var flags.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
forgejo_admin deleted branch 133-fix-ci-clone-alpine-git 2026-03-21 23:44:18 +00:00
Sign in to join this conversation.
No description provided.