fix(ci): use jq for PR comment JSON encoding in plan step (#307) #309
No reviewers
Labels
No labels
domain:backend
domain:devops
domain:frontend
status:approved
status:in-progress
status:needs-fix
status:qa
type:bug
type:devops
type:feature
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
ldraney/pal-e-platform!309
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "307-woodpecker-shell-escaping"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Summary
The plan step's PR-comment-posting wrapper used a fragile
sedpipeline to escape quotes/newlines into JSON. Whentofu planoutput contained nested quote patterns (e.g. k8smetadata.annotationslike"kubectl.kubernetes.io/last-applied-configuration": "{\"foo\":\"bar\"}"), the sed pipeline produced an unterminated quoted string, the step exited 2, the infra-validation hook blocked merge, and humans force-merged — eroding the safety gate.This PR replaces the sed escaping with
jq -n --rawfile body ..., which reads the comment body verbatim and JSON-encodes it correctly, handling all quotes, newlines, and backslashes. The actualtofu planwas always fine — only the comment-posting wrapper broke.Changes
.woodpecker.yaml(plan step):apk add --no-cache jq(opentofu image is Alpine-based and does not ship jq).printf '{"body":"%s"}' "$(echo ... | sed ... | sed ...)"pipeline withjq -n --rawfile body /tmp/comment-body.txt '{body: $body}'.head -cinstead of bash parameter expansion (cleaner, avoids loading whole plan into a shell variable).[ ! -f /tmp/plan-failed ]exit check and the 60000-char truncation cap.Apply Step Audit
The
applystep (line 182+) was audited per the ticket AC. It runs onevent: pushtomainand does not post PR comments (no PR exists for push events). It has no analogous comment-posting wrapper. No parallel fix needed.The
cross-pillar-reviewstep already usesjq -n --arg ... --rawfile ...correctly — that's the pattern this PR ports to the plan step.tofu plan Output
This change touches CI YAML only — no terraform resources change.
tofu planwill run as part of this PR and post a comment using the new code path. That comment posting itself is the verification (see Test Plan).Test Plan
python3 -c "import yaml; yaml.safe_load(open('.woodpecker.yaml'))"— OKpython3 -c "import json; json.load(open('payload.json'))"apk add jqsucceeds in the opentofu image## Tofu Plan Output (full)posts cleanly to this PRkubernetes_manifestresources) to confirm the fix holds in the wild.If verification on this PR succeeds, the fix is proven on the exact bug class. If a future quote-heavy plan still breaks, reopen with the failing PR number.
Review Checklist
Related Notes
feedback_yaml_parse_validation— YAML parse-validation gate satisfiedfeedback_tofu_lock_false—-lock=falsealready present in plan args (untouched)convention-cross-pillar-triggers—.woodpecker*matches; cross-pillar review issue will auto-create on mergeRelated
cross-pillar-reviewstep in same file (already uses jq correctly)The plan step's comment-posting wrapper built JSON payloads with a fragile sed pipeline that broke when tofu plan output contained nested quote patterns (e.g. kubernetes_manifest with annotations like "kubectl.kubernetes.io/last-applied-configuration": "{\"foo\":\"bar\"}"). The unterminated-string error caused the step to exit 2 and blocked the infra-validation hook even though the underlying tofu plan succeeded. Replace the sed escaping with `jq -n --rawfile body ...` which reads the comment body verbatim and JSON-encodes it correctly, handling all quotes, newlines, and backslashes. Adds `apk add --no-cache jq` since the opentofu image (Alpine-based) does not ship jq by default. Also moves the 60k truncation to operate on the file directly via `head -c` instead of bash parameter expansion, which is cleaner and avoids loading the whole plan into a shell variable. The apply step (push to main) does not post PR comments, so no parallel fix is needed there. The `[ ! -f /tmp/plan-failed ]` exit check and the 60000-char truncation cap are both preserved. Closes #307 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>PR #309 Review
DOMAIN REVIEW
Stack: Woodpecker CI YAML + Alpine shell (sh, not bash) + jq + wget. Single-file change to
.woodpecker.yamlplan step.Root-cause fix correctness — confirmed. The previous
printf '{"body":"%s"}' "$(echo "$BODY" | sed 's/"/\\"/g' | sed ':a;N;$!ba;s/\n/\\n/g')"had two failure modes:last-applied-configurationannotations contain\"that became double-mangled.$BODYgot re-tokenized by the shell when expanded intoprintf's argument, so embedded quotes broke the outer command's quoting.The new path eliminates both:
cat /tmp/plan-output.txtwrites bytes verbatim into/tmp/comment-body.txt— no shell re-tokenization of plan content.jq -n --rawfile body ... '{body: $body}'reads file as raw string and JSON-encodes per spec — handles quotes, backslashes, newlines, control chars correctly by construction.wget --post-file=posts file bytes verbatim — no shell interpolation of body.This is the right pattern, and it's already proven in the
cross-pillar-reviewstep (lines 439-442 use the identicaljq -n --arg ... --rawfile body ...pattern). Good consistency.Static YAML parse review: Structure is sound. New
apk add --no-cache jqis a clean scalar. The|block at 162-193 has consistent indentation, no unquoted colons in problematic positions,$$Woodpecker escaping preserved throughout. Perfeedback_yaml_parse_validation, static review passes; dev's localyaml.safe_loadclaim is consistent with the diff.Behavioral walkthrough on synthetic k8s-annotation-style input (mentally executed against the new pipeline since QA is read-only):
wc -c < file— byte count, no quote interpretation. Safe.head -c 60000truncation — byte-level. 60k cap preserved, now operates on file directly (cleaner than the old${PLAN_OUTPUT:0:60000}parameter expansion which loaded the whole plan into a shell var).catinto/tmp/comment-body.txt— verbatim bytes.jq --rawfile— guaranteed valid JSON output regardless of input content.wget --post-file=— verbatim post.[ ! -f /tmp/plan-failed ]at line 195 — exit check preserved.The original bug class cannot recur on this code path.
Apply-step audit (lines 197-338): Verified. Apply runs on
event: pushto main, has no PR-comment wrapper, noprintf '{"body":...}', nowget/curlto issues endpoint. Output goes to/tmp/apply-output.txtandcatonly. Dev's claim that no parallel fix is needed is correct.BLOCKERS
None.
NITS
Verification weakness — flag, not blocker. This PR's diff is
.woodpecker.yamlonly, so the plan output for THIS PR's CI run will be "No changes" or near-empty — it will exercise the newapk add jq+jq --rawfilecode path but will NOT stress-test the original bug's quote-heavy pattern. The PR self-verifies the happy path (jq installs, comment posts) but not the regression. The real proof point is the next infra PR that toucheskubernetes_manifest/kubernetes_secret_v1/ annotation-heavy resources. Recommend explicit post-merge monitoring on the next such PR before declaring the meta-bug closed.Minor: the
printf '\n...(truncated)' >> /tmp/plan-output.truncadds a literal...(truncated)marker outside any code fence. The body builder at line 180-183 then wraps the whole thing in triple backticks, so this renders inside the code block. Cosmetic — fine.Could add a smoke check after
apk add jq(jq --version) for clearer log output if install fails. Not required.SOP COMPLIANCE
307-woodpecker-shell-escaping— matches{issue}-{kebab}convention[ ! -f /tmp/plan-failed ]exit check preservedcross-pillar-reviewstep)feedback_yaml_parse_validationgate satisfied (static review confirms structure)feedback_no_workarounds_just_fix— root-cause fix using correct tool, no band-aidPROCESS OBSERVATIONS
This is a high-leverage fix: it restores the integrity of the infra-PR safety gate that has been bypassed via force-merges. Change failure risk is low (read-only verification path, no terraform state interaction). MTTR for future quote-heavy plan failures should drop to zero on this code path.
DORA note: The verification weakness (#1 in nits) is structural to fixing CI bugs — the only way to fully prove a CI fix is to land it and observe production runs. Recommend the next infra PR that touches k8s resources be flagged for explicit comment-posting verification before approval, treating it as the true integration test for this fix. Suggest adding a
feedback_ci_meta_fix_followup_validationmemory entry capturing this pattern.VERDICT: APPROVED
v2 fix: production verified (pipeline 453 success)
Root cause analysis:
v1 (jq inline in
|block) was structurally correct — verified locally with full Woodpecker shellescape simulation, base64 + busybox sh in the actualghcr.io/opentofu/opentofu:1.9image, and synthetic plan output containing the heredoc-rich admin-app-user-provision content (<<'SQL',DO $$,${PGHOST}, backticks). Local repro succeeded every time.But pipeline 451 (restart of 448) failed with
/bin/sh: syntax error: unterminated quoted stringimmediately afterapk add jq. Could not isolate the production-specific trigger despite faithful reproduction. Hypothesis: an interaction between Woodpecker'sshellescape.Quoteof a multi-line|block containing many embedded single-quotes (tr '\n',sed 's/,$//',printf '%s\n',jq '{body: $body}') and busybox sh's parser, possibly triggered by some environmental difference (memory pressure, locale, stdin buffering of the base64 pipe).v2 strategy (commit
a49c1d0):Restructured to three single-purpose commands instead of one large
|block:cat > /tmp/post-comment.sh <<'POSTEOF'— write the comment-poster as a standalone script via a quoted heredoc. Body is verbatim, no shell expansion at script-load time.chmod +x /tmp/post-comment.shFORGEJO_TOKEN/CI_REPO/CI_COMMIT_PULL_REQUESTas env vars and exec.Each Woodpecker command is its own
shellescape.Quoteboundary. Putting the complex shell logic INSIDE a quoted heredoc means the trace echo only has to wrap the literal heredoc text, not pre-parse multiple shell expressions with embedded single-quotes.Verification:
sh -nghcr.io/opentofu/opentofu:1.9(busybox 1.36.1) viaecho $CI_SCRIPT | base64 -d | /bin/sh -e— succeeds with both 1.7k and 82k synthetic plan outputs containing literal<<'SQL',DO $$,${PGHOST}, backticks[ ! -f /tmp/plan-failed ]exit check preservedThe wget POST may still fail in a non-cluster context (DNS) but
|| truekeeps the step green; that is unchanged from v1.PR #309 Re-Review (v2 — commit
a49c1d0)Re-review after v1 approved-but-broke-in-prod. v1 review slug: original PR #309 review.
DOMAIN REVIEW
Stack: Woodpecker CI YAML + busybox sh (Alpine opentofu image).
v1 → v2 restructure: Comment-posting collapsed from one inline
- |block into three discrete Woodpecker commands:apk add --no-cache jqcat > /tmp/post-comment.sh <<'POSTEOF' ... POSTEOF(single-quoted heredoc — no expansion)chmod +xthen exec with env vars passed on the command lineWhy this fixes #307: Each Woodpecker
-command is its ownshellescape.Quoteboundary. The single-quoted heredoc means Woodpecker's trace-echo wraps literal text rather than re-parsing shell with embedded escaped quotes. v1's- |multi-line block put quote-heavy script body inside one shellescape envelope — that's where busybox sh choked onunterminated quoted stringeven thoughsh -nparsed it cleanly locally.Structural soundness: Sound. The three-command split is the correct mental model for Woodpecker shellescape semantics.
PIPELINE 453 VERIFICATION (the only ground truth)
clonestep: success (commita49c1d0checked out)validatestep: success (tofu fmt -check -recursive+tofu validateclean)planstep: success, exit 0apk add --no-cache jq: succeeded (jq 1.7.1-r0 installed in opentofu Alpine image)module.database.kubernetes_job_v1.admin_app_user_provisionreplacement — the exact quote-heavy content (sensitive value annotations, env value_from blocks, secret_key_ref) that broke v1 in pipelines 442/448unterminated quoted string[ ! -f /tmp/plan-failed ]) preserved and passedThis is the verification v1 lacked. v1 was theoretically sound but never proven against the failing input until merge. v2 has been proven against the same input that broke v1.
CONVENTIONS CHECKED
head -c 60000on file — cleaner than v1's bash param expansion)[ ! -f /tmp/plan-failed ]exit gate preserved<<'POSTEOF'— quotes present (critical: unquoted would expand$VARto empty at heredoc-write time)FORGEJO_TOKEN/CI_REPO/CI_COMMIT_PULL_REQUESTpassed via$${VAR}Woodpecker-escape on exec line, consumed as plain$VARinside script — correct boundary handlingfeedback_yaml_parse_validationsatisfiedfeedback_tofu_lock_false:-lock=falsepresent in COMMON_ARGS (untouched)BLOCKERS
None.
NITS
wget ... || trueswallows API errors (4xx/5xx from Forgejo). Intentional — comment failure shouldn't fail the plan — but a future enhancement could capture stderr to a log file for debuggability. Non-blocking; route to plan Epilogue perfeedback_nits_to_epilogue.printf '...FENCE...' | sed 's/FENCE/```/'patterns are clever (avoids backtick-in-heredoc edge cases) but worth a one-line comment explaining why FENCE is used as a placeholder. Non-blocking.SOP COMPLIANCE
307-woodpecker-shell-escapingPROCESS OBSERVATIONS
sh -nand synthetic JSON tests are necessary but not sufficient. v2 explicitly tests against the failing input class (admin_app_user_provision plan), and pipeline 453 proves the fix.VERDICT: APPROVED