fix: broken exit code capture in apply step #155

Merged
forgejo_admin merged 1 commit from 151-fix-apply-exit-code into main 2026-03-24 20:38:18 +00:00

Summary

The apply step's exit code capture used a subshell piped to tee, which never worked. When tofu apply exited (success or failure), the pipe killed the subshell before echo $? could execute. /tmp/apply-exit was never created, making the entire lock detection block dead code.

Changes

  • .woodpecker.yaml: Replaced broken (tofu apply ...; echo $?) | tee pattern with sequential redirect (> file 2>&1 ; echo $? > exit-file) followed by cat for CI log visibility

Test Plan

  • Merge to main triggers the apply step — verify CI log shows tofu apply output AND the step correctly exits non-zero on apply failure
  • To test lock detection path: manually lock terraform state, push a commit, confirm the auto-unlock logic activates
  • No regressions in validate, plan, or cross-pillar-review steps (unchanged)

Review Checklist

  • Passed automated review-fix loop
  • No secrets committed
  • No unnecessary file changes
  • Commit messages are descriptive
  • Closes #151
  • Subsumes #136 (dead force-unlock block — it was dead because exit code capture was broken)
## Summary The apply step's exit code capture used a subshell piped to `tee`, which never worked. When `tofu apply` exited (success or failure), the pipe killed the subshell before `echo $?` could execute. `/tmp/apply-exit` was never created, making the entire lock detection block dead code. ## Changes - `.woodpecker.yaml`: Replaced broken `(tofu apply ...; echo $?) | tee` pattern with sequential redirect (`> file 2>&1 ; echo $? > exit-file`) followed by `cat` for CI log visibility ## Test Plan - [ ] Merge to main triggers the apply step — verify CI log shows tofu apply output AND the step correctly exits non-zero on apply failure - [ ] To test lock detection path: manually lock terraform state, push a commit, confirm the auto-unlock logic activates - [ ] No regressions in validate, plan, or cross-pillar-review steps (unchanged) ## Review Checklist - [ ] Passed automated review-fix loop - [ ] No secrets committed - [ ] No unnecessary file changes - [ ] Commit messages are descriptive ## Related - Closes #151 - Subsumes #136 (dead force-unlock block — it was dead because exit code capture was broken)
fix: replace broken subshell+pipe exit code capture in apply step
All checks were successful
ci/woodpecker/push/woodpecker Pipeline was successful
ci/woodpecker/pr/woodpecker Pipeline was successful
ci/woodpecker/pull_request_closed/woodpecker Pipeline was successful
f4dd209239
The subshell piped to tee pattern `(tofu apply ...; echo $? > file) | tee`
never executed the echo because the pipe killed the subshell on tofu exit.
This made /tmp/apply-exit never get created, turning the lock detection
block (lines 196-209) into dead code.

Replace with sequential redirect: write output to file, capture exit code
with semicolon chain, then cat the file for CI log visibility.

Closes #151
Subsumes #136

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Author
Owner

PR Review: #155

Diff Analysis

Scope: 1 file, +10/-2 lines. Only the apply step's command block was modified. No other steps touched.

Findings

Correctness -- PASS
The broken (tofu apply ... 2>&1; echo $? > /tmp/apply-exit) | tee /tmp/apply-output.txt pattern is correctly replaced. In POSIX sh, the pipe creates a subshell that gets killed on tofu apply exit, so echo $? never runs and /tmp/apply-exit is never created. The fix uses sequential redirect (> file 2>&1 ; echo $? > exit-file) which guarantees the exit code is captured after the command completes.

CI log visibility -- PASS
The cat /tmp/apply-output.txt line ensures apply output still appears in CI logs, replacing the role tee previously served.

Lock detection path -- PASS
The downstream if [ "$APPLY_EXIT" -ne 0 ] block is now reachable with a valid integer in APPLY_EXIT. Previously this was dead code.

No scope creep -- PASS
validate, plan, clone, and cross-pillar-review steps are untouched.

No secrets -- PASS

YAML validity -- PASS (validated with Python yaml.safe_load)

Nits

None.


VERDICT: APPROVE

## PR Review: #155 ### Diff Analysis **Scope:** 1 file, +10/-2 lines. Only the apply step's command block was modified. No other steps touched. ### Findings **Correctness -- PASS** The broken `(tofu apply ... 2>&1; echo $? > /tmp/apply-exit) | tee /tmp/apply-output.txt` pattern is correctly replaced. In POSIX sh, the pipe creates a subshell that gets killed on `tofu apply` exit, so `echo $?` never runs and `/tmp/apply-exit` is never created. The fix uses sequential redirect (`> file 2>&1 ; echo $? > exit-file`) which guarantees the exit code is captured after the command completes. **CI log visibility -- PASS** The `cat /tmp/apply-output.txt` line ensures apply output still appears in CI logs, replacing the role `tee` previously served. **Lock detection path -- PASS** The downstream `if [ "$APPLY_EXIT" -ne 0 ]` block is now reachable with a valid integer in `APPLY_EXIT`. Previously this was dead code. **No scope creep -- PASS** validate, plan, clone, and cross-pillar-review steps are untouched. **No secrets -- PASS** **YAML validity -- PASS** (validated with Python yaml.safe_load) ### Nits None. --- **VERDICT: APPROVE**
Author
Owner

PR #155 Review

DOMAIN REVIEW

Tech stack: Woodpecker CI YAML, POSIX shell (Alpine/ash). This is a CI/IaC domain review.

The bug: Line 193 on main uses (tofu apply ... 2>&1; echo $? > /tmp/apply-exit) | tee /tmp/apply-output.txt. The pipe creates two processes. When tee receives EOF (after tofu apply closes stdout), the subshell may be killed by SIGPIPE before echo $? executes. Even if not killed, /tmp/apply-exit creation is a race condition. The file was never reliably written, making APPLY_EXIT=$(cat /tmp/apply-exit) fail and the entire lock detection block (lines 196-209) dead code. Alpine/ash has no PIPESTATUS.

The fix: Replaces the broken pipe+subshell pattern with sequential redirect:

tofu apply ... > /tmp/apply-output.txt 2>&1 ; echo $? > /tmp/apply-exit
cat /tmp/apply-output.txt
APPLY_EXIT=$(cat /tmp/apply-exit)

Correctness verification:

  1. Exit code capture: The semicolon ; ensures echo $? runs sequentially after tofu apply in the same shell context. Shell redirections (> file 2>&1) are not commands and do not modify $?. So $? correctly reflects the tofu apply exit code. This is reliable POSIX sh.

  2. Lock detection reachable: The if [ "$APPLY_EXIT" -ne 0 ] && grep -q "the state is already locked" block is unchanged and now actually reachable because APPLY_EXIT will contain a valid numeric exit code.

  3. CI log visibility: cat /tmp/apply-output.txt shows the full output in the CI log. Tradeoff: output is no longer streamed in real time (if tofu apply hangs, nothing appears until timeout). This is acceptable -- the only correct approach in POSIX sh without PIPESTATUS, and CI step timeouts provide the safety net.

  4. YAML syntax: The | block scalar is maintained. Backslash line continuations within the block are shell continuations, not YAML syntax. The joined logical line is valid shell.

  5. Scope: 1 file changed, +10/-2. Only the capture mechanism is replaced. Lock detection logic, retry logic, and all other steps are untouched.

BLOCKERS

None.

This is a CI pipeline fix, not application code. No new functionality requiring test coverage. The test plan (manual CI verification on merge) is appropriate for this type of change.

NITS

  1. Non-streaming output: The old tee pattern (despite being broken for exit code capture) had the intent of streaming output. If a future enhancement wanted real-time streaming with correct exit code capture, the pattern would be:

    tofu apply ... > /tmp/apply-output.txt 2>&1 &
    APPLY_PID=$!
    tail -f /tmp/apply-output.txt &
    TAIL_PID=$!
    wait $APPLY_PID
    echo $? > /tmp/apply-exit
    kill $TAIL_PID 2>/dev/null
    

    Not worth doing now -- the current fix is simpler and correct. Noting for future reference only.

  2. Comment wording: The comment # Attempt apply -- redirect to file, capture exit code sequentially is clear and accurate. No issue, just noting it is well-documented.

SOP COMPLIANCE

  • Branch named after issue (151-fix-apply-exit-code references #151)
  • PR body has Summary, Changes, Test Plan, Related
  • Related references issue #151 (bug fix, no plan slug expected) and subsumes #136
  • No secrets committed
  • No unnecessary file changes (1 file, surgical fix)
  • Commit messages are descriptive

PROCESS OBSERVATIONS

  • DORA impact: This fix restores the lock detection and auto-unlock path, which directly improves Mean Time to Recovery (MTTR). Previously, a locked state during CI apply would cause a hard failure with no auto-recovery. Now the auto-unlock retry actually fires.
  • Change failure risk: Low. The fix is minimal and the new pattern is simpler than what it replaces. The only behavioral change is output appearing after completion rather than streaming.
  • Issue #136 subsumption: The PR correctly identifies that #136 ("remove dead force-unlock block") is subsumed -- the block was only dead because the exit code capture was broken. Fixing the root cause makes the force-unlock block functional rather than dead.

VERDICT: APPROVED

## PR #155 Review ### DOMAIN REVIEW **Tech stack**: Woodpecker CI YAML, POSIX shell (Alpine/ash). This is a CI/IaC domain review. **The bug**: Line 193 on main uses `(tofu apply ... 2>&1; echo $? > /tmp/apply-exit) | tee /tmp/apply-output.txt`. The pipe creates two processes. When `tee` receives EOF (after `tofu apply` closes stdout), the subshell may be killed by SIGPIPE before `echo $?` executes. Even if not killed, `/tmp/apply-exit` creation is a race condition. The file was never reliably written, making `APPLY_EXIT=$(cat /tmp/apply-exit)` fail and the entire lock detection block (lines 196-209) dead code. Alpine/ash has no `PIPESTATUS`. **The fix**: Replaces the broken pipe+subshell pattern with sequential redirect: ``` tofu apply ... > /tmp/apply-output.txt 2>&1 ; echo $? > /tmp/apply-exit cat /tmp/apply-output.txt APPLY_EXIT=$(cat /tmp/apply-exit) ``` **Correctness verification**: 1. **Exit code capture**: The semicolon `;` ensures `echo $?` runs sequentially after `tofu apply` in the same shell context. Shell redirections (`> file 2>&1`) are not commands and do not modify `$?`. So `$?` correctly reflects the `tofu apply` exit code. This is reliable POSIX sh. 2. **Lock detection reachable**: The `if [ "$APPLY_EXIT" -ne 0 ] && grep -q "the state is already locked"` block is unchanged and now actually reachable because `APPLY_EXIT` will contain a valid numeric exit code. 3. **CI log visibility**: `cat /tmp/apply-output.txt` shows the full output in the CI log. Tradeoff: output is no longer streamed in real time (if `tofu apply` hangs, nothing appears until timeout). This is acceptable -- the only correct approach in POSIX sh without `PIPESTATUS`, and CI step timeouts provide the safety net. 4. **YAML syntax**: The `|` block scalar is maintained. Backslash line continuations within the block are shell continuations, not YAML syntax. The joined logical line is valid shell. 5. **Scope**: 1 file changed, +10/-2. Only the capture mechanism is replaced. Lock detection logic, retry logic, and all other steps are untouched. ### BLOCKERS None. This is a CI pipeline fix, not application code. No new functionality requiring test coverage. The test plan (manual CI verification on merge) is appropriate for this type of change. ### NITS 1. **Non-streaming output**: The old `tee` pattern (despite being broken for exit code capture) had the _intent_ of streaming output. If a future enhancement wanted real-time streaming with correct exit code capture, the pattern would be: ``` tofu apply ... > /tmp/apply-output.txt 2>&1 & APPLY_PID=$! tail -f /tmp/apply-output.txt & TAIL_PID=$! wait $APPLY_PID echo $? > /tmp/apply-exit kill $TAIL_PID 2>/dev/null ``` Not worth doing now -- the current fix is simpler and correct. Noting for future reference only. 2. **Comment wording**: The comment `# Attempt apply -- redirect to file, capture exit code sequentially` is clear and accurate. No issue, just noting it is well-documented. ### SOP COMPLIANCE - [x] Branch named after issue (`151-fix-apply-exit-code` references #151) - [x] PR body has Summary, Changes, Test Plan, Related - [x] Related references issue #151 (bug fix, no plan slug expected) and subsumes #136 - [x] No secrets committed - [x] No unnecessary file changes (1 file, surgical fix) - [x] Commit messages are descriptive ### PROCESS OBSERVATIONS - **DORA impact**: This fix restores the lock detection and auto-unlock path, which directly improves Mean Time to Recovery (MTTR). Previously, a locked state during CI apply would cause a hard failure with no auto-recovery. Now the auto-unlock retry actually fires. - **Change failure risk**: Low. The fix is minimal and the new pattern is simpler than what it replaces. The only behavioral change is output appearing after completion rather than streaming. - **Issue #136 subsumption**: The PR correctly identifies that #136 ("remove dead force-unlock block") is subsumed -- the block was only dead because the exit code capture was broken. Fixing the root cause makes the force-unlock block functional rather than dead. ### VERDICT: APPROVED
Author
Owner

PR #155 Review

DOMAIN REVIEW

Tech stack: Woodpecker CI pipeline configuration (.woodpecker.yaml), shell scripting, OpenTofu/Terraform IaC.

Root cause analysis is correct. The old pattern (tofu apply ... 2>&1; echo $? > /tmp/apply-exit) | tee /tmp/apply-output.txt had a well-known shell antipattern: piping a subshell to tee means the subshell's SIGPIPE handling (or shell short-circuit on pipe close) can terminate the subshell before echo $? executes. The /tmp/apply-exit file was never created, making APPLY_EXIT empty and the entire lock detection conditional (lines 196-209) unreachable dead code.

Fix is correct. The new pattern:

tofu apply ... > /tmp/apply-output.txt 2>&1 ; echo $? > /tmp/apply-exit
cat /tmp/apply-output.txt
APPLY_EXIT=$(cat /tmp/apply-exit)

This works because:

  1. Redirect (>) does not create a pipe -- no SIGPIPE risk, no subshell.
  2. Sequential ; guarantees echo $? runs after tofu apply exits. $? correctly holds the tofu exit code.
  3. cat /tmp/apply-output.txt restores CI log visibility that tee was supposed to provide.
  4. APPLY_EXIT now reliably contains a valid integer, so the lock detection conditional becomes reachable.

Consistency with plan step. The plan step (line 103) already uses the same redirect-to-file pattern: > /tmp/plan-output.txt 2>&1 || echo "plan-failed". The apply step fix makes the two steps consistent. No DRY concern since the plan step uses a different failure-capture mechanism (sentinel file vs exit code file), which is appropriate given the plan step needs to continue for PR comment posting while the apply step needs the actual exit code for lock detection branching.

Lock detection block (lines 196-209) is revived. This correctly subsumes #136 -- the force-unlock block was never dead by design, only dead because the exit code capture was broken. Fixing the root cause is the right approach versus removing the block.

No other | tee patterns remain in the pipeline that could have the same bug.

BLOCKERS

None.

This is a CI pipeline shell fix. There is no application code, no user input handling, no auth logic, and no secrets in the diff. The BLOCKER criteria (test coverage for new functionality, input validation, secrets in code, DRY in auth paths) do not apply to this change. The fix is a 10-line shell correction with clear before/after semantics.

Regarding test coverage: CI pipeline shell logic is not unit-testable in the traditional sense. The test plan correctly specifies integration validation (merge to main triggers apply, verify CI log output, test lock detection path manually). This is the appropriate test strategy for this domain.

NITS

  1. Comment on #136 disposition. The PR body says "Subsumes #136" but #136 is still open. After this PR merges, #136 should be closed with a reference to this PR (since the force-unlock block is no longer dead code).

SOP COMPLIANCE

  • Branch named after issue (151-fix-apply-exit-code references issue #151)
  • PR body follows template (Summary, Changes, Test Plan, Review Checklist, Related sections present)
  • Related section references issue #151 and #136
  • No secrets committed (diff contains no credentials; all secrets remain from_secret references)
  • No unnecessary file changes (1 file changed, all changes are on-topic)
  • Commit messages are descriptive (PR title: "fix: broken exit code capture in apply step")
  • Related section does not reference a plan slug -- acceptable for a bug fix (no plan needed per feedback_tickets_are_stories.md)

PROCESS OBSERVATIONS

DORA impact (MTTR): This fix directly improves Mean Time to Recovery. With the apply exit code now captured correctly, state lock situations will be auto-detected and auto-unlocked instead of requiring manual intervention. This is a meaningful MTTR improvement for the CI pipeline.

Change failure risk: Low. The change is minimal (10 additions, 2 deletions), touches only the apply step, and the shell semantics are well-understood. The plan and validate steps are untouched. CI pipeline #242 passed validation.

Deployment frequency impact: Positive. With lock detection working, stuck-state-lock situations no longer block the entire deployment pipeline, reducing manual intervention delays.

VERDICT: APPROVED

## PR #155 Review ### DOMAIN REVIEW **Tech stack:** Woodpecker CI pipeline configuration (`.woodpecker.yaml`), shell scripting, OpenTofu/Terraform IaC. **Root cause analysis is correct.** The old pattern `(tofu apply ... 2>&1; echo $? > /tmp/apply-exit) | tee /tmp/apply-output.txt` had a well-known shell antipattern: piping a subshell to `tee` means the subshell's SIGPIPE handling (or shell short-circuit on pipe close) can terminate the subshell before `echo $?` executes. The `/tmp/apply-exit` file was never created, making `APPLY_EXIT` empty and the entire lock detection conditional (lines 196-209) unreachable dead code. **Fix is correct.** The new pattern: ```sh tofu apply ... > /tmp/apply-output.txt 2>&1 ; echo $? > /tmp/apply-exit cat /tmp/apply-output.txt APPLY_EXIT=$(cat /tmp/apply-exit) ``` This works because: 1. Redirect (`>`) does not create a pipe -- no SIGPIPE risk, no subshell. 2. Sequential `;` guarantees `echo $?` runs after `tofu apply` exits. `$?` correctly holds the tofu exit code. 3. `cat /tmp/apply-output.txt` restores CI log visibility that `tee` was supposed to provide. 4. `APPLY_EXIT` now reliably contains a valid integer, so the lock detection conditional becomes reachable. **Consistency with plan step.** The plan step (line 103) already uses the same redirect-to-file pattern: `> /tmp/plan-output.txt 2>&1 || echo "plan-failed"`. The apply step fix makes the two steps consistent. No DRY concern since the plan step uses a different failure-capture mechanism (sentinel file vs exit code file), which is appropriate given the plan step needs to continue for PR comment posting while the apply step needs the actual exit code for lock detection branching. **Lock detection block (lines 196-209) is revived.** This correctly subsumes #136 -- the force-unlock block was never dead by design, only dead because the exit code capture was broken. Fixing the root cause is the right approach versus removing the block. **No other `| tee` patterns remain** in the pipeline that could have the same bug. ### BLOCKERS None. This is a CI pipeline shell fix. There is no application code, no user input handling, no auth logic, and no secrets in the diff. The BLOCKER criteria (test coverage for new functionality, input validation, secrets in code, DRY in auth paths) do not apply to this change. The fix is a 10-line shell correction with clear before/after semantics. Regarding test coverage: CI pipeline shell logic is not unit-testable in the traditional sense. The test plan correctly specifies integration validation (merge to main triggers apply, verify CI log output, test lock detection path manually). This is the appropriate test strategy for this domain. ### NITS 1. **Comment on #136 disposition.** The PR body says "Subsumes #136" but #136 is still open. After this PR merges, #136 should be closed with a reference to this PR (since the force-unlock block is no longer dead code). ### SOP COMPLIANCE - [x] Branch named after issue (`151-fix-apply-exit-code` references issue #151) - [x] PR body follows template (Summary, Changes, Test Plan, Review Checklist, Related sections present) - [x] Related section references issue #151 and #136 - [x] No secrets committed (diff contains no credentials; all secrets remain `from_secret` references) - [x] No unnecessary file changes (1 file changed, all changes are on-topic) - [x] Commit messages are descriptive (PR title: "fix: broken exit code capture in apply step") - [ ] Related section does not reference a plan slug -- acceptable for a bug fix (no plan needed per `feedback_tickets_are_stories.md`) ### PROCESS OBSERVATIONS **DORA impact (MTTR):** This fix directly improves Mean Time to Recovery. With the apply exit code now captured correctly, state lock situations will be auto-detected and auto-unlocked instead of requiring manual intervention. This is a meaningful MTTR improvement for the CI pipeline. **Change failure risk:** Low. The change is minimal (10 additions, 2 deletions), touches only the apply step, and the shell semantics are well-understood. The plan and validate steps are untouched. CI pipeline #242 passed validation. **Deployment frequency impact:** Positive. With lock detection working, stuck-state-lock situations no longer block the entire deployment pipeline, reducing manual intervention delays. ### VERDICT: APPROVED
forgejo_admin deleted branch 151-fix-apply-exit-code 2026-03-24 20:38:18 +00:00
Sign in to join this conversation.
No description provided.