fix: remove WOODPECKER_EXPERT_FORGE_OAUTH_HOST to fix token refresh #139

Merged
forgejo_admin merged 1 commit from 138-split-horizon-dns into main 2026-03-21 20:20:32 +00:00

Summary

  • Remove WOODPECKER_EXPERT_FORGE_OAUTH_HOST from Woodpecker server Helm values
  • This forces OAuth token exchange and refresh to use WOODPECKER_FORGEJO_URL (internal HTTP) instead of hairpinning through external DERP relay IPs

Changes

  • terraform/main.tf: Remove WOODPECKER_EXPERT_FORGE_OAUTH_HOST env var from Woodpecker server config (1 line)
  • tofu fmt applied, tofu validate passes

tofu plan impact

  • Woodpecker server StatefulSet will be updated (env var removed)
  • Server pod will restart with new config
  • After restart: admin must re-authenticate Woodpecker via browser (one-time)

Test Plan

  • tofu validate passes
  • tofu plan shows only Woodpecker server StatefulSet change
  • After apply: Woodpecker server logs show no refresh oauth token failed errors
  • After re-auth: PR events create steps (not "no steps" error)
  • External browser access to Woodpecker UI still works

Review Checklist

  • No secrets committed
  • No unnecessary file changes
  • Commit messages are descriptive
  • tofu fmt and tofu validate pass
  • Closes #138 — Split-horizon DNS / intra-cluster TLS hairpin
  • Unblocks PR #134 (CI clone fix) — PR events will work after this lands
  • Root cause: forgejo.tail5b443a.ts.net resolves to public DERP IPs from inside cluster, causing 66% TLS failure rate on OAuth token refresh
## Summary - Remove `WOODPECKER_EXPERT_FORGE_OAUTH_HOST` from Woodpecker server Helm values - This forces OAuth token exchange and refresh to use `WOODPECKER_FORGEJO_URL` (internal HTTP) instead of hairpinning through external DERP relay IPs ## Changes - `terraform/main.tf`: Remove `WOODPECKER_EXPERT_FORGE_OAUTH_HOST` env var from Woodpecker server config (1 line) - `tofu fmt` applied, `tofu validate` passes ### tofu plan impact - Woodpecker server StatefulSet will be updated (env var removed) - Server pod will restart with new config - **After restart:** admin must re-authenticate Woodpecker via browser (one-time) ## Test Plan - [ ] `tofu validate` passes - [ ] `tofu plan` shows only Woodpecker server StatefulSet change - [ ] After apply: Woodpecker server logs show no `refresh oauth token failed` errors - [ ] After re-auth: PR events create steps (not "no steps" error) - [ ] External browser access to Woodpecker UI still works ## Review Checklist - [ ] No secrets committed - [ ] No unnecessary file changes - [ ] Commit messages are descriptive - [ ] tofu fmt and tofu validate pass ## Related - Closes #138 — Split-horizon DNS / intra-cluster TLS hairpin - Unblocks PR #134 (CI clone fix) — PR events will work after this lands - Root cause: `forgejo.tail5b443a.ts.net` resolves to public DERP IPs from inside cluster, causing 66% TLS failure rate on OAuth token refresh
fix: remove WOODPECKER_EXPERT_FORGE_OAUTH_HOST to fix token refresh (#138)
Some checks failed
ci/woodpecker/push/woodpecker Pipeline failed
ci/woodpecker/pull_request_closed/woodpecker Pipeline was successful
a78f0cdd82
WOODPECKER_EXPERT_FORGE_OAUTH_HOST forces all OAuth operations
(including server-to-server token refresh) through the external
Tailscale funnel URL. From inside the cluster, this hairpins through
public DERP relay IPs, causing ~66% TLS EOF failures. The dead
OAuth token prevents Woodpecker from reading .woodpecker.yaml on
private repos, causing "no steps" errors on all PR events.

Removing the override makes Woodpecker use WOODPECKER_FORGEJO_URL
(internal HTTP) for token exchange and refresh. Browser OAuth login
requires kubectl port-forward to Forgejo when needed (rare admin op).

Closes #138

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Author
Owner

PR #139 Review

DOMAIN REVIEW

Tech stack: Terraform (OpenTofu) / Helm / Kubernetes

Change analysis: Single env var removal from Woodpecker server Helm values in terraform/main.tf. The WOODPECKER_EXPERT_FORGE_OAUTH_HOST override was forcing OAuth token refresh through the external Tailscale funnel URL (https://forgejo.{tailscale_domain}), which resolves to public DERP relay IPs from inside the cluster -- causing a 66% TLS handshake failure rate. Removing it allows Woodpecker to fall back to WOODPECKER_FORGEJO_URL (http://forgejo-http.forgejo.svc.cluster.local:80) for all Forgejo API communication including OAuth token refresh.

Terraform style: Alignment padding was reformatted to stay consistent after the line removal. This is consistent with tofu fmt output. The remaining env vars are properly aligned.

Secrets handling: All sensitive values continue to use var.* references with set_sensitive blocks (lines 760-781). No plaintext secrets in the diff or the surrounding context. The database connection string uses var.woodpecker_db_password interpolation, which is the existing pattern.

k8s impact: StatefulSet pod restart is expected. The PR body correctly documents the post-apply requirement for admin re-authentication (one-time browser OAuth flow). This is an inherent consequence of changing Woodpecker's OAuth configuration.

No stale references: Confirmed WOODPECKER_EXPERT_FORGE_OAUTH_HOST does not appear anywhere else in the repo (excluding .claude/worktrees/ artifacts from previous agent sessions).

BLOCKERS

None.

This is an infrastructure config fix (1 env var removal). No new functionality is introduced, so no new test coverage is required. No user input paths are affected. No secrets are exposed. No auth logic is duplicated.

NITS

  1. PR body template: The "Related" section references Closes #138 and mentions unblocking PR #134, but does not reference the plan slug (plan-pal-e-platform). Minor SOP gap.

  2. Stale worktrees: There are 5+ .claude/worktrees/agent-* directories containing old copies of main.tf with the WOODPECKER_EXPERT_FORGE_OAUTH_HOST line still present. These are not part of this PR, but the worktree accumulation itself is a housekeeping concern for the repo.

SOP COMPLIANCE

  • Branch named after issue (138-split-horizon-dns matches issue #138)
  • PR body has Summary, Changes, Test Plan, Related sections
  • Related section references plan slug -- missing plan-pal-e-platform reference
  • No secrets committed
  • No unnecessary file changes (single file, tightly scoped)
  • Commit message is descriptive (fix: remove WOODPECKER_EXPERT_FORGE_OAUTH_HOST to fix token refresh)
  • tofu fmt and tofu validate stated as passing in PR body

PROCESS OBSERVATIONS

  • DORA impact: This fix directly improves Deployment Frequency and Change Failure Rate by eliminating the 66% OAuth token refresh failure that blocks CI pipeline execution. Every failed token refresh means a pipeline that cannot process PR events ("no steps" error).
  • Cascading unblock: This PR unblocks PR #134 (CI clone fix), which together should restore reliable CI pipeline execution. Good sequencing.
  • Operational note: The post-apply re-authentication step is a one-time manual action. The PR body documents this clearly, which is appreciated.

VERDICT: APPROVED

## PR #139 Review ### DOMAIN REVIEW **Tech stack**: Terraform (OpenTofu) / Helm / Kubernetes **Change analysis**: Single env var removal from Woodpecker server Helm values in `terraform/main.tf`. The `WOODPECKER_EXPERT_FORGE_OAUTH_HOST` override was forcing OAuth token refresh through the external Tailscale funnel URL (`https://forgejo.{tailscale_domain}`), which resolves to public DERP relay IPs from inside the cluster -- causing a 66% TLS handshake failure rate. Removing it allows Woodpecker to fall back to `WOODPECKER_FORGEJO_URL` (`http://forgejo-http.forgejo.svc.cluster.local:80`) for all Forgejo API communication including OAuth token refresh. **Terraform style**: Alignment padding was reformatted to stay consistent after the line removal. This is consistent with `tofu fmt` output. The remaining env vars are properly aligned. **Secrets handling**: All sensitive values continue to use `var.*` references with `set_sensitive` blocks (lines 760-781). No plaintext secrets in the diff or the surrounding context. The database connection string uses `var.woodpecker_db_password` interpolation, which is the existing pattern. **k8s impact**: StatefulSet pod restart is expected. The PR body correctly documents the post-apply requirement for admin re-authentication (one-time browser OAuth flow). This is an inherent consequence of changing Woodpecker's OAuth configuration. **No stale references**: Confirmed `WOODPECKER_EXPERT_FORGE_OAUTH_HOST` does not appear anywhere else in the repo (excluding `.claude/worktrees/` artifacts from previous agent sessions). ### BLOCKERS None. This is an infrastructure config fix (1 env var removal). No new functionality is introduced, so no new test coverage is required. No user input paths are affected. No secrets are exposed. No auth logic is duplicated. ### NITS 1. **PR body template**: The "Related" section references `Closes #138` and mentions unblocking PR #134, but does not reference the plan slug (`plan-pal-e-platform`). Minor SOP gap. 2. **Stale worktrees**: There are 5+ `.claude/worktrees/agent-*` directories containing old copies of `main.tf` with the `WOODPECKER_EXPERT_FORGE_OAUTH_HOST` line still present. These are not part of this PR, but the worktree accumulation itself is a housekeeping concern for the repo. ### SOP COMPLIANCE - [x] Branch named after issue (`138-split-horizon-dns` matches issue #138) - [x] PR body has Summary, Changes, Test Plan, Related sections - [ ] Related section references plan slug -- missing `plan-pal-e-platform` reference - [x] No secrets committed - [x] No unnecessary file changes (single file, tightly scoped) - [x] Commit message is descriptive (`fix: remove WOODPECKER_EXPERT_FORGE_OAUTH_HOST to fix token refresh`) - [x] `tofu fmt` and `tofu validate` stated as passing in PR body ### PROCESS OBSERVATIONS - **DORA impact**: This fix directly improves Deployment Frequency and Change Failure Rate by eliminating the 66% OAuth token refresh failure that blocks CI pipeline execution. Every failed token refresh means a pipeline that cannot process PR events ("no steps" error). - **Cascading unblock**: This PR unblocks PR #134 (CI clone fix), which together should restore reliable CI pipeline execution. Good sequencing. - **Operational note**: The post-apply re-authentication step is a one-time manual action. The PR body documents this clearly, which is appreciated. ### VERDICT: APPROVED
forgejo_admin deleted branch 138-split-horizon-dns 2026-03-21 20:20:32 +00:00
Sign in to join this conversation.
No description provided.