fix: broken exit code capture in apply step #155
No reviewers
Labels
No labels
domain:backend
domain:devops
domain:frontend
status:approved
status:in-progress
status:needs-fix
status:qa
type:bug
type:devops
type:feature
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
forgejo_admin/pal-e-platform!155
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "151-fix-apply-exit-code"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Summary
The apply step's exit code capture used a subshell piped to
tee, which never worked. Whentofu applyexited (success or failure), the pipe killed the subshell beforeecho $?could execute./tmp/apply-exitwas never created, making the entire lock detection block dead code.Changes
.woodpecker.yaml: Replaced broken(tofu apply ...; echo $?) | teepattern with sequential redirect (> file 2>&1 ; echo $? > exit-file) followed bycatfor CI log visibilityTest Plan
Review Checklist
Related
PR Review: #155
Diff Analysis
Scope: 1 file, +10/-2 lines. Only the apply step's command block was modified. No other steps touched.
Findings
Correctness -- PASS
The broken
(tofu apply ... 2>&1; echo $? > /tmp/apply-exit) | tee /tmp/apply-output.txtpattern is correctly replaced. In POSIX sh, the pipe creates a subshell that gets killed ontofu applyexit, soecho $?never runs and/tmp/apply-exitis never created. The fix uses sequential redirect (> file 2>&1 ; echo $? > exit-file) which guarantees the exit code is captured after the command completes.CI log visibility -- PASS
The
cat /tmp/apply-output.txtline ensures apply output still appears in CI logs, replacing the roleteepreviously served.Lock detection path -- PASS
The downstream
if [ "$APPLY_EXIT" -ne 0 ]block is now reachable with a valid integer inAPPLY_EXIT. Previously this was dead code.No scope creep -- PASS
validate, plan, clone, and cross-pillar-review steps are untouched.
No secrets -- PASS
YAML validity -- PASS (validated with Python yaml.safe_load)
Nits
None.
VERDICT: APPROVE
PR #155 Review
DOMAIN REVIEW
Tech stack: Woodpecker CI YAML, POSIX shell (Alpine/ash). This is a CI/IaC domain review.
The bug: Line 193 on main uses
(tofu apply ... 2>&1; echo $? > /tmp/apply-exit) | tee /tmp/apply-output.txt. The pipe creates two processes. Whenteereceives EOF (aftertofu applycloses stdout), the subshell may be killed by SIGPIPE beforeecho $?executes. Even if not killed,/tmp/apply-exitcreation is a race condition. The file was never reliably written, makingAPPLY_EXIT=$(cat /tmp/apply-exit)fail and the entire lock detection block (lines 196-209) dead code. Alpine/ash has noPIPESTATUS.The fix: Replaces the broken pipe+subshell pattern with sequential redirect:
Correctness verification:
Exit code capture: The semicolon
;ensuresecho $?runs sequentially aftertofu applyin the same shell context. Shell redirections (> file 2>&1) are not commands and do not modify$?. So$?correctly reflects thetofu applyexit code. This is reliable POSIX sh.Lock detection reachable: The
if [ "$APPLY_EXIT" -ne 0 ] && grep -q "the state is already locked"block is unchanged and now actually reachable becauseAPPLY_EXITwill contain a valid numeric exit code.CI log visibility:
cat /tmp/apply-output.txtshows the full output in the CI log. Tradeoff: output is no longer streamed in real time (iftofu applyhangs, nothing appears until timeout). This is acceptable -- the only correct approach in POSIX sh withoutPIPESTATUS, and CI step timeouts provide the safety net.YAML syntax: The
|block scalar is maintained. Backslash line continuations within the block are shell continuations, not YAML syntax. The joined logical line is valid shell.Scope: 1 file changed, +10/-2. Only the capture mechanism is replaced. Lock detection logic, retry logic, and all other steps are untouched.
BLOCKERS
None.
This is a CI pipeline fix, not application code. No new functionality requiring test coverage. The test plan (manual CI verification on merge) is appropriate for this type of change.
NITS
Non-streaming output: The old
teepattern (despite being broken for exit code capture) had the intent of streaming output. If a future enhancement wanted real-time streaming with correct exit code capture, the pattern would be:Not worth doing now -- the current fix is simpler and correct. Noting for future reference only.
Comment wording: The comment
# Attempt apply -- redirect to file, capture exit code sequentiallyis clear and accurate. No issue, just noting it is well-documented.SOP COMPLIANCE
151-fix-apply-exit-codereferences #151)PROCESS OBSERVATIONS
VERDICT: APPROVED
PR #155 Review
DOMAIN REVIEW
Tech stack: Woodpecker CI pipeline configuration (
.woodpecker.yaml), shell scripting, OpenTofu/Terraform IaC.Root cause analysis is correct. The old pattern
(tofu apply ... 2>&1; echo $? > /tmp/apply-exit) | tee /tmp/apply-output.txthad a well-known shell antipattern: piping a subshell toteemeans the subshell's SIGPIPE handling (or shell short-circuit on pipe close) can terminate the subshell beforeecho $?executes. The/tmp/apply-exitfile was never created, makingAPPLY_EXITempty and the entire lock detection conditional (lines 196-209) unreachable dead code.Fix is correct. The new pattern:
This works because:
>) does not create a pipe -- no SIGPIPE risk, no subshell.;guaranteesecho $?runs aftertofu applyexits.$?correctly holds the tofu exit code.cat /tmp/apply-output.txtrestores CI log visibility thatteewas supposed to provide.APPLY_EXITnow reliably contains a valid integer, so the lock detection conditional becomes reachable.Consistency with plan step. The plan step (line 103) already uses the same redirect-to-file pattern:
> /tmp/plan-output.txt 2>&1 || echo "plan-failed". The apply step fix makes the two steps consistent. No DRY concern since the plan step uses a different failure-capture mechanism (sentinel file vs exit code file), which is appropriate given the plan step needs to continue for PR comment posting while the apply step needs the actual exit code for lock detection branching.Lock detection block (lines 196-209) is revived. This correctly subsumes #136 -- the force-unlock block was never dead by design, only dead because the exit code capture was broken. Fixing the root cause is the right approach versus removing the block.
No other
| teepatterns remain in the pipeline that could have the same bug.BLOCKERS
None.
This is a CI pipeline shell fix. There is no application code, no user input handling, no auth logic, and no secrets in the diff. The BLOCKER criteria (test coverage for new functionality, input validation, secrets in code, DRY in auth paths) do not apply to this change. The fix is a 10-line shell correction with clear before/after semantics.
Regarding test coverage: CI pipeline shell logic is not unit-testable in the traditional sense. The test plan correctly specifies integration validation (merge to main triggers apply, verify CI log output, test lock detection path manually). This is the appropriate test strategy for this domain.
NITS
SOP COMPLIANCE
151-fix-apply-exit-codereferences issue #151)from_secretreferences)feedback_tickets_are_stories.md)PROCESS OBSERVATIONS
DORA impact (MTTR): This fix directly improves Mean Time to Recovery. With the apply exit code now captured correctly, state lock situations will be auto-detected and auto-unlocked instead of requiring manual intervention. This is a meaningful MTTR improvement for the CI pipeline.
Change failure risk: Low. The change is minimal (10 additions, 2 deletions), touches only the apply step, and the shell semantics are well-understood. The plan and validate steps are untouched. CI pipeline #242 passed validation.
Deployment frequency impact: Positive. With lock detection working, stuck-state-lock situations no longer block the entire deployment pipeline, reducing manual intervention delays.
VERDICT: APPROVED