fix(ci): use jq for PR comment JSON encoding in plan step (#307) #309

Merged
forgejo_admin merged 2 commits from 307-woodpecker-shell-escaping into main 2026-04-27 19:21:30 +00:00
Contributor

Summary

The plan step's PR-comment-posting wrapper used a fragile sed pipeline to escape quotes/newlines into JSON. When tofu plan output contained nested quote patterns (e.g. k8s metadata.annotations like "kubectl.kubernetes.io/last-applied-configuration": "{\"foo\":\"bar\"}"), the sed pipeline produced an unterminated quoted string, the step exited 2, the infra-validation hook blocked merge, and humans force-merged — eroding the safety gate.

This PR replaces the sed escaping with jq -n --rawfile body ..., which reads the comment body verbatim and JSON-encodes it correctly, handling all quotes, newlines, and backslashes. The actual tofu plan was always fine — only the comment-posting wrapper broke.

Changes

  • .woodpecker.yaml (plan step):
    • Added apk add --no-cache jq (opentofu image is Alpine-based and does not ship jq).
    • Replaced the printf '{"body":"%s"}' "$(echo ... | sed ... | sed ...)" pipeline with jq -n --rawfile body /tmp/comment-body.txt '{body: $body}'.
    • Moved the 60000-char truncation to operate on the file directly via head -c instead of bash parameter expansion (cleaner, avoids loading whole plan into a shell variable).
    • Preserved the [ ! -f /tmp/plan-failed ] exit check and the 60000-char truncation cap.

Apply Step Audit

The apply step (line 182+) was audited per the ticket AC. It runs on event: push to main and does not post PR comments (no PR exists for push events). It has no analogous comment-posting wrapper. No parallel fix needed.

The cross-pillar-review step already uses jq -n --arg ... --rawfile ... correctly — that's the pattern this PR ports to the plan step.

tofu plan Output

This change touches CI YAML only — no terraform resources change. tofu plan will run as part of this PR and post a comment using the new code path. That comment posting itself is the verification (see Test Plan).

Test Plan

  • YAML parses: python3 -c "import yaml; yaml.safe_load(open('.woodpecker.yaml'))" — OK
  • Local jq encoding test with quote-heavy synthetic plan output (k8s manifest with escaped JSON in annotations) produces valid JSON — verified with python3 -c "import json; json.load(open('payload.json'))"
  • End-to-end: This PR itself triggers the plan step. Watch CI for:
    1. apk add jq succeeds in the opentofu image
    2. Plan step exits 0
    3. PR comment with ## Tofu Plan Output (full) posts cleanly to this PR
  • After merge: monitor next infra PR (especially any that touches kubernetes_manifest resources) to confirm the fix holds in the wild.

If verification on this PR succeeds, the fix is proven on the exact bug class. If a future quote-heavy plan still breaks, reopen with the failing PR number.

Review Checklist

  • YAML parses (validated locally; CI will re-validate)
  • No terraform resources changed (CI-only fix)
  • Apply step audited — no parallel fix needed (push event, no PR comment)
  • 60k truncation cap preserved
  • Plan-failed exit check preserved
  • Verify on this PR's own CI run that the comment posts cleanly
  • feedback_yaml_parse_validation — YAML parse-validation gate satisfied
  • feedback_tofu_lock_false-lock=false already present in plan args (untouched)
  • convention-cross-pillar-triggers.woodpecker* matches; cross-pillar review issue will auto-create on merge
  • Closes #307
  • Forgejo issue: #307
  • Pattern source: cross-pillar-review step in same file (already uses jq correctly)
## Summary The plan step's PR-comment-posting wrapper used a fragile `sed` pipeline to escape quotes/newlines into JSON. When `tofu plan` output contained nested quote patterns (e.g. k8s `metadata.annotations` like `"kubectl.kubernetes.io/last-applied-configuration": "{\"foo\":\"bar\"}"`), the sed pipeline produced an unterminated quoted string, the step exited 2, the infra-validation hook blocked merge, and humans force-merged — eroding the safety gate. This PR replaces the sed escaping with `jq -n --rawfile body ...`, which reads the comment body verbatim and JSON-encodes it correctly, handling all quotes, newlines, and backslashes. The actual `tofu plan` was always fine — only the comment-posting wrapper broke. ## Changes - `.woodpecker.yaml` (plan step): - Added `apk add --no-cache jq` (opentofu image is Alpine-based and does not ship jq). - Replaced the `printf '{"body":"%s"}' "$(echo ... | sed ... | sed ...)"` pipeline with `jq -n --rawfile body /tmp/comment-body.txt '{body: $body}'`. - Moved the 60000-char truncation to operate on the file directly via `head -c` instead of bash parameter expansion (cleaner, avoids loading whole plan into a shell variable). - Preserved the `[ ! -f /tmp/plan-failed ]` exit check and the 60000-char truncation cap. ## Apply Step Audit The `apply` step (line 182+) was audited per the ticket AC. It runs on `event: push` to `main` and does **not** post PR comments (no PR exists for push events). It has no analogous comment-posting wrapper. **No parallel fix needed.** The `cross-pillar-review` step already uses `jq -n --arg ... --rawfile ...` correctly — that's the pattern this PR ports to the plan step. ## tofu plan Output This change touches CI YAML only — no terraform resources change. `tofu plan` will run as part of this PR and post a comment using the new code path. That comment posting itself is the verification (see Test Plan). ## Test Plan - [x] YAML parses: `python3 -c "import yaml; yaml.safe_load(open('.woodpecker.yaml'))"` — OK - [x] Local jq encoding test with quote-heavy synthetic plan output (k8s manifest with escaped JSON in annotations) produces valid JSON — verified with `python3 -c "import json; json.load(open('payload.json'))"` - [ ] **End-to-end:** This PR itself triggers the plan step. Watch CI for: 1. `apk add jq` succeeds in the opentofu image 2. Plan step exits 0 3. PR comment with `## Tofu Plan Output (full)` posts cleanly to this PR - [ ] After merge: monitor next infra PR (especially any that touches `kubernetes_manifest` resources) to confirm the fix holds in the wild. If verification on this PR succeeds, the fix is proven on the exact bug class. If a future quote-heavy plan still breaks, reopen with the failing PR number. ## Review Checklist - [ ] YAML parses (validated locally; CI will re-validate) - [ ] No terraform resources changed (CI-only fix) - [ ] Apply step audited — no parallel fix needed (push event, no PR comment) - [ ] 60k truncation cap preserved - [ ] Plan-failed exit check preserved - [ ] Verify on this PR's own CI run that the comment posts cleanly ## Related Notes - `feedback_yaml_parse_validation` — YAML parse-validation gate satisfied - `feedback_tofu_lock_false` — `-lock=false` already present in plan args (untouched) - `convention-cross-pillar-triggers` — `.woodpecker*` matches; cross-pillar review issue will auto-create on merge ## Related - Closes #307 - Forgejo issue: #307 - Pattern source: `cross-pillar-review` step in same file (already uses jq correctly)
fix(ci): use jq for PR comment JSON encoding in plan step
Some checks failed
ci/woodpecker/push/woodpecker Pipeline was successful
ci/woodpecker/pr/woodpecker Pipeline failed
193a5121e2
The plan step's comment-posting wrapper built JSON payloads with a fragile
sed pipeline that broke when tofu plan output contained nested quote
patterns (e.g. kubernetes_manifest with annotations like
"kubectl.kubernetes.io/last-applied-configuration": "{\"foo\":\"bar\"}").
The unterminated-string error caused the step to exit 2 and blocked the
infra-validation hook even though the underlying tofu plan succeeded.

Replace the sed escaping with `jq -n --rawfile body ...` which reads the
comment body verbatim and JSON-encodes it correctly, handling all quotes,
newlines, and backslashes. Adds `apk add --no-cache jq` since the
opentofu image (Alpine-based) does not ship jq by default.

Also moves the 60k truncation to operate on the file directly via
`head -c` instead of bash parameter expansion, which is cleaner and
avoids loading the whole plan into a shell variable.

The apply step (push to main) does not post PR comments, so no parallel
fix is needed there. The `[ ! -f /tmp/plan-failed ]` exit check and the
60000-char truncation cap are both preserved.

Closes #307

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Author
Contributor

PR #309 Review

DOMAIN REVIEW

Stack: Woodpecker CI YAML + Alpine shell (sh, not bash) + jq + wget. Single-file change to .woodpecker.yaml plan step.

Root-cause fix correctness — confirmed. The previous printf '{"body":"%s"}' "$(echo "$BODY" | sed 's/"/\\"/g' | sed ':a;N;$!ba;s/\n/\\n/g')" had two failure modes:

  1. Did not escape pre-existing backslashes — k8s last-applied-configuration annotations contain \" that became double-mangled.
  2. $BODY got re-tokenized by the shell when expanded into printf's argument, so embedded quotes broke the outer command's quoting.

The new path eliminates both:

  • cat /tmp/plan-output.txt writes bytes verbatim into /tmp/comment-body.txt — no shell re-tokenization of plan content.
  • jq -n --rawfile body ... '{body: $body}' reads file as raw string and JSON-encodes per spec — handles quotes, backslashes, newlines, control chars correctly by construction.
  • wget --post-file= posts file bytes verbatim — no shell interpolation of body.

This is the right pattern, and it's already proven in the cross-pillar-review step (lines 439-442 use the identical jq -n --arg ... --rawfile body ... pattern). Good consistency.

Static YAML parse review: Structure is sound. New apk add --no-cache jq is a clean scalar. The | block at 162-193 has consistent indentation, no unquoted colons in problematic positions, $$ Woodpecker escaping preserved throughout. Per feedback_yaml_parse_validation, static review passes; dev's local yaml.safe_load claim is consistent with the diff.

Behavioral walkthrough on synthetic k8s-annotation-style input (mentally executed against the new pipeline since QA is read-only):

  1. wc -c < file — byte count, no quote interpretation. Safe.
  2. head -c 60000 truncation — byte-level. 60k cap preserved, now operates on file directly (cleaner than the old ${PLAN_OUTPUT:0:60000} parameter expansion which loaded the whole plan into a shell var).
  3. cat into /tmp/comment-body.txt — verbatim bytes.
  4. jq --rawfile — guaranteed valid JSON output regardless of input content.
  5. wget --post-file= — verbatim post.
  6. [ ! -f /tmp/plan-failed ] at line 195 — exit check preserved.

The original bug class cannot recur on this code path.

Apply-step audit (lines 197-338): Verified. Apply runs on event: push to main, has no PR-comment wrapper, no printf '{"body":...}', no wget/curl to issues endpoint. Output goes to /tmp/apply-output.txt and cat only. Dev's claim that no parallel fix is needed is correct.

BLOCKERS

None.

NITS

  1. Verification weakness — flag, not blocker. This PR's diff is .woodpecker.yaml only, so the plan output for THIS PR's CI run will be "No changes" or near-empty — it will exercise the new apk add jq + jq --rawfile code path but will NOT stress-test the original bug's quote-heavy pattern. The PR self-verifies the happy path (jq installs, comment posts) but not the regression. The real proof point is the next infra PR that touches kubernetes_manifest / kubernetes_secret_v1 / annotation-heavy resources. Recommend explicit post-merge monitoring on the next such PR before declaring the meta-bug closed.

  2. Minor: the printf '\n...(truncated)' >> /tmp/plan-output.trunc adds a literal ...(truncated) marker outside any code fence. The body builder at line 180-183 then wraps the whole thing in triple backticks, so this renders inside the code block. Cosmetic — fine.

  3. Could add a smoke check after apk add jq (jq --version) for clearer log output if install fails. Not required.

SOP COMPLIANCE

  • Branch named 307-woodpecker-shell-escaping — matches {issue}-{kebab} convention
  • PR body has Summary, Changes, Test Plan, Apply Step Audit, Related, Closes #307
  • No secrets committed
  • No collateral changes (single file, +21/-6)
  • 60k truncation cap preserved
  • [ ! -f /tmp/plan-failed ] exit check preserved
  • Pattern source cited (cross-pillar-review step)
  • feedback_yaml_parse_validation gate satisfied (static review confirms structure)
  • feedback_no_workarounds_just_fix — root-cause fix using correct tool, no band-aid

PROCESS OBSERVATIONS

This is a high-leverage fix: it restores the integrity of the infra-PR safety gate that has been bypassed via force-merges. Change failure risk is low (read-only verification path, no terraform state interaction). MTTR for future quote-heavy plan failures should drop to zero on this code path.

DORA note: The verification weakness (#1 in nits) is structural to fixing CI bugs — the only way to fully prove a CI fix is to land it and observe production runs. Recommend the next infra PR that touches k8s resources be flagged for explicit comment-posting verification before approval, treating it as the true integration test for this fix. Suggest adding a feedback_ci_meta_fix_followup_validation memory entry capturing this pattern.

VERDICT: APPROVED

## PR #309 Review ### DOMAIN REVIEW **Stack:** Woodpecker CI YAML + Alpine shell (sh, not bash) + jq + wget. Single-file change to `.woodpecker.yaml` plan step. **Root-cause fix correctness — confirmed.** The previous `printf '{"body":"%s"}' "$(echo "$BODY" | sed 's/"/\\"/g' | sed ':a;N;$!ba;s/\n/\\n/g')"` had two failure modes: 1. Did not escape pre-existing backslashes — k8s `last-applied-configuration` annotations contain `\"` that became double-mangled. 2. `$BODY` got re-tokenized by the shell when expanded into `printf`'s argument, so embedded quotes broke the outer command's quoting. The new path eliminates both: - `cat /tmp/plan-output.txt` writes bytes verbatim into `/tmp/comment-body.txt` — no shell re-tokenization of plan content. - `jq -n --rawfile body ... '{body: $body}'` reads file as raw string and JSON-encodes per spec — handles quotes, backslashes, newlines, control chars correctly by construction. - `wget --post-file=` posts file bytes verbatim — no shell interpolation of body. This is the right pattern, and it's already proven in the `cross-pillar-review` step (lines 439-442 use the identical `jq -n --arg ... --rawfile body ...` pattern). Good consistency. **Static YAML parse review:** Structure is sound. New `apk add --no-cache jq` is a clean scalar. The `|` block at 162-193 has consistent indentation, no unquoted colons in problematic positions, `$$` Woodpecker escaping preserved throughout. Per `feedback_yaml_parse_validation`, static review passes; dev's local `yaml.safe_load` claim is consistent with the diff. **Behavioral walkthrough on synthetic k8s-annotation-style input** (mentally executed against the new pipeline since QA is read-only): 1. `wc -c < file` — byte count, no quote interpretation. Safe. 2. `head -c 60000` truncation — byte-level. **60k cap preserved**, now operates on file directly (cleaner than the old `${PLAN_OUTPUT:0:60000}` parameter expansion which loaded the whole plan into a shell var). 3. `cat` into `/tmp/comment-body.txt` — verbatim bytes. 4. `jq --rawfile` — guaranteed valid JSON output regardless of input content. 5. `wget --post-file=` — verbatim post. 6. `[ ! -f /tmp/plan-failed ]` at line 195 — **exit check preserved.** The original bug class cannot recur on this code path. **Apply-step audit (lines 197-338):** Verified. Apply runs on `event: push` to main, has no PR-comment wrapper, no `printf '{"body":...}'`, no `wget`/`curl` to issues endpoint. Output goes to `/tmp/apply-output.txt` and `cat` only. **Dev's claim that no parallel fix is needed is correct.** ### BLOCKERS None. ### NITS 1. **Verification weakness — flag, not blocker.** This PR's diff is `.woodpecker.yaml` only, so the plan output for THIS PR's CI run will be "No changes" or near-empty — it will exercise the new `apk add jq` + `jq --rawfile` code path but will NOT stress-test the original bug's quote-heavy pattern. The PR self-verifies the happy path (jq installs, comment posts) but not the regression. **The real proof point is the next infra PR that touches `kubernetes_manifest` / `kubernetes_secret_v1` / annotation-heavy resources.** Recommend explicit post-merge monitoring on the next such PR before declaring the meta-bug closed. 2. Minor: the `printf '\n...(truncated)' >> /tmp/plan-output.trunc` adds a literal `...(truncated)` marker outside any code fence. The body builder at line 180-183 then wraps the whole thing in triple backticks, so this renders inside the code block. Cosmetic — fine. 3. Could add a smoke check after `apk add jq` (`jq --version`) for clearer log output if install fails. Not required. ### SOP COMPLIANCE - [x] Branch named `307-woodpecker-shell-escaping` — matches `{issue}-{kebab}` convention - [x] PR body has Summary, Changes, Test Plan, Apply Step Audit, Related, Closes #307 - [x] No secrets committed - [x] No collateral changes (single file, +21/-6) - [x] 60k truncation cap preserved - [x] `[ ! -f /tmp/plan-failed ]` exit check preserved - [x] Pattern source cited (`cross-pillar-review` step) - [x] `feedback_yaml_parse_validation` gate satisfied (static review confirms structure) - [x] `feedback_no_workarounds_just_fix` — root-cause fix using correct tool, no band-aid ### PROCESS OBSERVATIONS This is a high-leverage fix: it restores the integrity of the infra-PR safety gate that has been bypassed via force-merges. Change failure risk is low (read-only verification path, no terraform state interaction). MTTR for future quote-heavy plan failures should drop to zero on this code path. DORA note: The verification weakness (#1 in nits) is structural to fixing CI bugs — the only way to fully prove a CI fix is to land it and observe production runs. Recommend the next infra PR that touches k8s resources be flagged for explicit comment-posting verification before approval, treating it as the true integration test for this fix. Suggest adding a `feedback_ci_meta_fix_followup_validation` memory entry capturing this pattern. ### VERDICT: APPROVED
fix(ci): restructure plan-comment to standalone script (avoids shell parser bug)
All checks were successful
ci/woodpecker/push/woodpecker Pipeline was successful
ci/woodpecker/pr/woodpecker Pipeline was successful
ci/woodpecker/pull_request_closed/woodpecker Pipeline was successful
a49c1d05c4
PR #309 v1 attempted to fix #307 by replacing sed-based JSON encoding with
jq --rawfile. Local repro of v1 (full Woodpecker shellescape + base64
+ Alpine busybox sh + actual plan output content) succeeded, but production
pipelines 448 + 451 still failed with `/bin/sh: syntax error: unterminated
quoted string` immediately after `apk add jq`.

Could not isolate the exact production trigger (likely an interaction between
the multi-line `|` block, shellescape's quote-wrapping, and busybox sh's
parser when the trace echo string exceeds some internal limit), so this
restructures the comment-poster to be structurally simpler:

- Step 1: write the comment-poster as a standalone script via a quoted
  heredoc (`<<'POSTEOF'`). Heredoc body is verbatim — no shell expansion,
  no quote-escape interaction.
- Step 2: chmod +x.
- Step 3: exec the script with FORGEJO_TOKEN/CI_REPO/CI_COMMIT_PULL_REQUEST
  passed as env vars.

Each Woodpecker command is its own shellescape boundary, so each one's trace
echo is small and structurally simple. The `cat > file <<'POSTEOF'` form
puts all the complex shell content INSIDE the heredoc (where parsing rules
do not apply at script-load time) — only the heredoc terminator must match.

Verified locally:
- yaml.safe_load OK
- Generated Woodpecker trace script passes `sh -n`
- Full step runs successfully in `ghcr.io/opentofu/opentofu:1.9` (busybox
  1.36.1) via `echo $CI_SCRIPT | base64 -d | /bin/sh -e` with both 1.7k and
  82k synthetic plan outputs containing literal `<<'SQL'`, `DO $$`,
  `${PGHOST}`, backticks, and other quote-heavy content.
- jq output is valid JSON (`jq -e .` round-trip succeeds).
- 60k truncation cap and `[ ! -f /tmp/plan-failed ]` exit check preserved.

Refs #307
Author
Contributor

v2 fix: production verified (pipeline 453 success)

Root cause analysis:

v1 (jq inline in | block) was structurally correct — verified locally with full Woodpecker shellescape simulation, base64 + busybox sh in the actual ghcr.io/opentofu/opentofu:1.9 image, and synthetic plan output containing the heredoc-rich admin-app-user-provision content (<<'SQL', DO $$, ${PGHOST}, backticks). Local repro succeeded every time.

But pipeline 451 (restart of 448) failed with /bin/sh: syntax error: unterminated quoted string immediately after apk add jq. Could not isolate the production-specific trigger despite faithful reproduction. Hypothesis: an interaction between Woodpecker's shellescape.Quote of a multi-line | block containing many embedded single-quotes (tr '\n', sed 's/,$//', printf '%s\n', jq '{body: $body}') and busybox sh's parser, possibly triggered by some environmental difference (memory pressure, locale, stdin buffering of the base64 pipe).

v2 strategy (commit a49c1d0):

Restructured to three single-purpose commands instead of one large | block:

  1. cat > /tmp/post-comment.sh <<'POSTEOF' — write the comment-poster as a standalone script via a quoted heredoc. Body is verbatim, no shell expansion at script-load time.
  2. chmod +x /tmp/post-comment.sh
  3. Pass FORGEJO_TOKEN/CI_REPO/CI_COMMIT_PULL_REQUEST as env vars and exec.

Each Woodpecker command is its own shellescape.Quote boundary. Putting the complex shell logic INSIDE a quoted heredoc means the trace echo only has to wrap the literal heredoc text, not pre-parse multiple shell expressions with embedded single-quotes.

Verification:

  • yaml.safe_load OK
  • Generated trace script passes sh -n
  • Full step runs in actual ghcr.io/opentofu/opentofu:1.9 (busybox 1.36.1) via echo $CI_SCRIPT | base64 -d | /bin/sh -e — succeeds with both 1.7k and 82k synthetic plan outputs containing literal <<'SQL', DO $$, ${PGHOST}, backticks
  • jq output is valid JSON
  • 60k truncation cap and [ ! -f /tmp/plan-failed ] exit check preserved
  • Production pipeline 453 (PR event, this branch): success — plan step exit_code 0

The wget POST may still fail in a non-cluster context (DNS) but || true keeps the step green; that is unchanged from v1.

## v2 fix: production verified (pipeline 453 success) **Root cause analysis:** v1 (jq inline in `|` block) was structurally correct — verified locally with full Woodpecker shellescape simulation, base64 + busybox sh in the actual `ghcr.io/opentofu/opentofu:1.9` image, and synthetic plan output containing the heredoc-rich admin-app-user-provision content (`<<'SQL'`, `DO $$`, `${PGHOST}`, backticks). Local repro succeeded every time. But pipeline 451 (restart of 448) failed with `/bin/sh: syntax error: unterminated quoted string` immediately after `apk add jq`. Could not isolate the production-specific trigger despite faithful reproduction. Hypothesis: an interaction between Woodpecker's `shellescape.Quote` of a multi-line `|` block containing many embedded single-quotes (`tr '\n'`, `sed 's/,$//'`, `printf '%s\n'`, `jq '{body: $body}'`) and busybox sh's parser, possibly triggered by some environmental difference (memory pressure, locale, stdin buffering of the base64 pipe). **v2 strategy (commit a49c1d0):** Restructured to three single-purpose commands instead of one large `|` block: 1. `cat > /tmp/post-comment.sh <<'POSTEOF'` — write the comment-poster as a standalone script via a quoted heredoc. Body is verbatim, no shell expansion at script-load time. 2. `chmod +x /tmp/post-comment.sh` 3. Pass `FORGEJO_TOKEN`/`CI_REPO`/`CI_COMMIT_PULL_REQUEST` as env vars and exec. Each Woodpecker command is its own `shellescape.Quote` boundary. Putting the complex shell logic INSIDE a quoted heredoc means the trace echo only has to wrap the literal heredoc text, not pre-parse multiple shell expressions with embedded single-quotes. **Verification:** - yaml.safe_load OK - Generated trace script passes `sh -n` - Full step runs in actual `ghcr.io/opentofu/opentofu:1.9` (busybox 1.36.1) via `echo $CI_SCRIPT | base64 -d | /bin/sh -e` — succeeds with both 1.7k and 82k synthetic plan outputs containing literal `<<'SQL'`, `DO $$`, `${PGHOST}`, backticks - jq output is valid JSON - 60k truncation cap and `[ ! -f /tmp/plan-failed ]` exit check preserved - **Production pipeline 453 (PR event, this branch): success — plan step exit_code 0** The wget POST may still fail in a non-cluster context (DNS) but `|| true` keeps the step green; that is unchanged from v1.
Author
Contributor

PR #309 Re-Review (v2 — commit a49c1d0)

Re-review after v1 approved-but-broke-in-prod. v1 review slug: original PR #309 review.

DOMAIN REVIEW

Stack: Woodpecker CI YAML + busybox sh (Alpine opentofu image).

v1 → v2 restructure: Comment-posting collapsed from one inline - | block into three discrete Woodpecker commands:

  1. apk add --no-cache jq
  2. cat > /tmp/post-comment.sh <<'POSTEOF' ... POSTEOF (single-quoted heredoc — no expansion)
  3. chmod +x then exec with env vars passed on the command line

Why this fixes #307: Each Woodpecker - command is its own shellescape.Quote boundary. The single-quoted heredoc means Woodpecker's trace-echo wraps literal text rather than re-parsing shell with embedded escaped quotes. v1's - | multi-line block put quote-heavy script body inside one shellescape envelope — that's where busybox sh choked on unterminated quoted string even though sh -n parsed it cleanly locally.

Structural soundness: Sound. The three-command split is the correct mental model for Woodpecker shellescape semantics.

PIPELINE 453 VERIFICATION (the only ground truth)

  • Pipeline status: success
  • clone step: success (commit a49c1d0 checked out)
  • validate step: success (tofu fmt -check -recursive + tofu validate clean)
  • plan step: success, exit 0
  • apk add --no-cache jq: succeeded (jq 1.7.1-r0 installed in opentofu Alpine image)
  • Plan output included the module.database.kubernetes_job_v1.admin_app_user_provision replacement — the exact quote-heavy content (sensitive value annotations, env value_from blocks, secret_key_ref) that broke v1 in pipelines 442/448
  • Heredoc-rich post-comment script ran without unterminated quoted string
  • Plan-failed gate ([ ! -f /tmp/plan-failed ]) preserved and passed

This is the verification v1 lacked. v1 was theoretically sound but never proven against the failing input until merge. v2 has been proven against the same input that broke v1.

CONVENTIONS CHECKED

  • YAML parses (validate step ran tofu init/validate, implies YAML loaded)
  • 60k truncation cap preserved (now via head -c 60000 on file — cleaner than v1's bash param expansion)
  • [ ! -f /tmp/plan-failed ] exit gate preserved
  • No collateral changes outside the comment-posting block
  • Single-quoted heredoc <<'POSTEOF' — quotes present (critical: unquoted would expand $VAR to empty at heredoc-write time)
  • Env vars FORGEJO_TOKEN/CI_REPO/CI_COMMIT_PULL_REQUEST passed via $${VAR} Woodpecker-escape on exec line, consumed as plain $VAR inside script — correct boundary handling
  • Apply step untouched (push event, no PR comment path — audited clean in v1)
  • feedback_yaml_parse_validation satisfied
  • feedback_tofu_lock_false: -lock=false present in COMMON_ARGS (untouched)

BLOCKERS

None.

NITS

  • wget ... || true swallows API errors (4xx/5xx from Forgejo). Intentional — comment failure shouldn't fail the plan — but a future enhancement could capture stderr to a log file for debuggability. Non-blocking; route to plan Epilogue per feedback_nits_to_epilogue.
  • The two printf '...FENCE...' | sed 's/FENCE/```/' patterns are clever (avoids backtick-in-heredoc edge cases) but worth a one-line comment explaining why FENCE is used as a placeholder. Non-blocking.

SOP COMPLIANCE

  • Branch named after issue: 307-woodpecker-shell-escaping
  • PR body has Summary / Changes / Test Plan / Related
  • Related references issue #307
  • No secrets committed
  • No scope creep — touches only the comment-posting block

PROCESS OBSERVATIONS

  • v1 lesson incorporated: production pipeline success is the only ground-truth test for shellescape interactions. Local sh -n and synthetic JSON tests are necessary but not sufficient. v2 explicitly tests against the failing input class (admin_app_user_provision plan), and pipeline 453 proves the fix.
  • This PR meaningfully improves CI failure modes (eliminates a class of quote-related pipeline failures that were forcing humans to override the merge gate). Net DORA: positive on CFR (fewer broken-CI human-overrides) and positive on MTTR for infra PRs (plan comments will reliably post).
  • Future-proofing: the three-command pattern is now a reusable template for any other Woodpecker step that needs to run a quote-heavy inline script. Worth documenting as a convention.

VERDICT: APPROVED

## PR #309 Re-Review (v2 — commit a49c1d0) Re-review after v1 approved-but-broke-in-prod. v1 review slug: original PR #309 review. ### DOMAIN REVIEW **Stack:** Woodpecker CI YAML + busybox sh (Alpine opentofu image). **v1 → v2 restructure:** Comment-posting collapsed from one inline `- |` block into three discrete Woodpecker commands: 1. `apk add --no-cache jq` 2. `cat > /tmp/post-comment.sh <<'POSTEOF' ... POSTEOF` (single-quoted heredoc — no expansion) 3. `chmod +x` then exec with env vars passed on the command line **Why this fixes #307:** Each Woodpecker `-` command is its own `shellescape.Quote` boundary. The single-quoted heredoc means Woodpecker's trace-echo wraps literal text rather than re-parsing shell with embedded escaped quotes. v1's `- |` multi-line block put quote-heavy script body inside one shellescape envelope — that's where busybox sh choked on `unterminated quoted string` even though `sh -n` parsed it cleanly locally. **Structural soundness:** Sound. The three-command split is the correct mental model for Woodpecker shellescape semantics. ### PIPELINE 453 VERIFICATION (the only ground truth) - Pipeline status: **success** - `clone` step: success (commit a49c1d0 checked out) - `validate` step: success (`tofu fmt -check -recursive` + `tofu validate` clean) - `plan` step: **success, exit 0** - `apk add --no-cache jq`: succeeded (jq 1.7.1-r0 installed in opentofu Alpine image) - Plan output included the `module.database.kubernetes_job_v1.admin_app_user_provision` replacement — the exact quote-heavy content (sensitive value annotations, env value_from blocks, secret_key_ref) that broke v1 in pipelines 442/448 - Heredoc-rich post-comment script ran without `unterminated quoted string` - Plan-failed gate (`[ ! -f /tmp/plan-failed ]`) preserved and passed This is the verification v1 lacked. v1 was theoretically sound but never proven against the failing input until merge. v2 has been proven against the same input that broke v1. ### CONVENTIONS CHECKED - [x] YAML parses (validate step ran tofu init/validate, implies YAML loaded) - [x] 60k truncation cap preserved (now via `head -c 60000` on file — cleaner than v1's bash param expansion) - [x] `[ ! -f /tmp/plan-failed ]` exit gate preserved - [x] No collateral changes outside the comment-posting block - [x] Single-quoted heredoc `<<'POSTEOF'` — quotes present (critical: unquoted would expand `$VAR` to empty at heredoc-write time) - [x] Env vars `FORGEJO_TOKEN`/`CI_REPO`/`CI_COMMIT_PULL_REQUEST` passed via `$${VAR}` Woodpecker-escape on exec line, consumed as plain `$VAR` inside script — correct boundary handling - [x] Apply step untouched (push event, no PR comment path — audited clean in v1) - [x] `feedback_yaml_parse_validation` satisfied - [x] `feedback_tofu_lock_false`: `-lock=false` present in COMMON_ARGS (untouched) ### BLOCKERS None. ### NITS - `wget ... || true` swallows API errors (4xx/5xx from Forgejo). Intentional — comment failure shouldn't fail the plan — but a future enhancement could capture stderr to a log file for debuggability. Non-blocking; route to plan Epilogue per `feedback_nits_to_epilogue`. - The two `printf '...FENCE...' | sed 's/FENCE/```/'` patterns are clever (avoids backtick-in-heredoc edge cases) but worth a one-line comment explaining why FENCE is used as a placeholder. Non-blocking. ### SOP COMPLIANCE - [x] Branch named after issue: `307-woodpecker-shell-escaping` - [x] PR body has Summary / Changes / Test Plan / Related - [x] Related references issue #307 - [x] No secrets committed - [x] No scope creep — touches only the comment-posting block ### PROCESS OBSERVATIONS - v1 lesson incorporated: production pipeline success is the only ground-truth test for shellescape interactions. Local `sh -n` and synthetic JSON tests are necessary but not sufficient. v2 explicitly tests against the failing input class (admin_app_user_provision plan), and pipeline 453 proves the fix. - This PR meaningfully improves CI failure modes (eliminates a class of quote-related pipeline failures that were forcing humans to override the merge gate). Net DORA: positive on CFR (fewer broken-CI human-overrides) and positive on MTTR for infra PRs (plan comments will reliably post). - Future-proofing: the three-command pattern is now a reusable template for any other Woodpecker step that needs to run a quote-heavy inline script. Worth documenting as a convention. ### VERDICT: APPROVED
forgejo_admin deleted branch 307-woodpecker-shell-escaping 2026-04-27 19:21:31 +00:00
Sign in to join this conversation.
No description provided.