fix(database): replace DO block with psql conditional for var substitution (#319) #320

Merged
forgejo_admin merged 1 commit from 319-fix-psql-var-substitution into main 2026-04-30 11:55:07 +00:00
Contributor

Summary

Replace the SQL Job's DO $body$ ... $body$; block with psql client-side \gset + \if conditional. Same idempotent CREATE-or-ALTER role semantics, but actually works at runtime.

Why

After PR #318 fixed the DO $$ HCL escape issue, the next bug surfaced: psql's :'admin_pw' variable substitution does NOT work inside dollar-quoted blocks. psql treats $body$ ... $body$ as a single literal token and skips substitution. Postgres receives literal :'admin_pw' token and rejects with syntax error at or near ":" at character 164.

Discovered when Ava ran make tofu-apply post-#318. Verified manually by running the new psql \if/\gset SQL directly against basketball-api postgres — admin_app role created successfully (kubectl exec ... psql ... \du admin_app).

The two approaches (procedural DO blocks, psql client-side conditionals) each work alone. PR #304's design tried to combine them and hit this fundamental conflict.

Changes

  • terraform/modules/database/main.tf lines 192-202 — replace the DO $body$ BEGIN IF NOT EXISTS ... ELSE ... END $body$; block with:
SELECT EXISTS(SELECT 1 FROM pg_roles WHERE rolname = 'admin_app') AS role_exists \gset
\if :role_exists
  ALTER ROLE admin_app WITH LOGIN PASSWORD :'admin_pw';
\else
  CREATE ROLE admin_app WITH LOGIN PASSWORD :'admin_pw';
\endif

Same idempotent semantics. No dollar-quoting. psql variable substitution works natively in regular SQL.

Test Plan

  • Manually executed the new SQL via kubectl exec ... psql -i against basketball-api postgres on 2026-04-30 — admin_app role + grants applied successfully
  • Post-merge: tofu apply -target=module.database Job pod completes Succeeded (idempotent — finds existing role, ALTERs to same password = no-op)
  • Pod logs show ==> admin_app role provisioned successfully
  • Re-apply path verified (rotation simulation)

Review Checklist

  • DO block removed; psql conditional preserves CREATE-or-ALTER idempotency
  • No change to GRANT statements, ALTER DEFAULT PRIVILEGES, or env var handling
  • No new variables or pillar entries
  • Closes #319
  • Story+arch trace (story:admin-row-crud, arch:postgres)
  • Caused by: PR #304 (design conflict between psql vars and DO blocks)
  • Sibling fixes during this bootstrap apply: #316 (label /), #318 (DO HCL escape)
  • Lesson: 4th post-merge runtime gap of this apply attempt (after salt master crash, label /, DO HCL escape). All would have been caught by pre-merge tofu apply to a real cluster. Reinforces feedback_tofu_validate_not_k8s_api.
  • Manual SQL was run on basketball-api postgres BEFORE this PR merged to unblock the deploy chain. Job re-run post-merge will be idempotent (role exists, ALTER same password = no-op).

Closes #319

## Summary Replace the SQL Job's `DO $body$ ... $body$;` block with psql client-side `\gset` + `\if` conditional. Same idempotent CREATE-or-ALTER role semantics, but actually works at runtime. ## Why After PR #318 fixed the `DO $$` HCL escape issue, the next bug surfaced: psql's `:'admin_pw'` variable substitution does NOT work inside dollar-quoted blocks. psql treats `$body$ ... $body$` as a single literal token and skips substitution. Postgres receives literal `:'admin_pw'` token and rejects with `syntax error at or near ":" at character 164`. Discovered when Ava ran `make tofu-apply` post-#318. Verified manually by running the new psql `\if`/`\gset` SQL directly against basketball-api postgres — `admin_app` role created successfully (`kubectl exec ... psql ... \du admin_app`). The two approaches (procedural DO blocks, psql client-side conditionals) each work alone. PR #304's design tried to combine them and hit this fundamental conflict. ## Changes - `terraform/modules/database/main.tf` lines 192-202 — replace the `DO $body$ BEGIN IF NOT EXISTS ... ELSE ... END $body$;` block with: ```sql SELECT EXISTS(SELECT 1 FROM pg_roles WHERE rolname = 'admin_app') AS role_exists \gset \if :role_exists ALTER ROLE admin_app WITH LOGIN PASSWORD :'admin_pw'; \else CREATE ROLE admin_app WITH LOGIN PASSWORD :'admin_pw'; \endif ``` Same idempotent semantics. No dollar-quoting. psql variable substitution works natively in regular SQL. ## Test Plan - [x] Manually executed the new SQL via `kubectl exec ... psql -i` against basketball-api postgres on 2026-04-30 — `admin_app` role + grants applied successfully - [ ] Post-merge: `tofu apply -target=module.database` Job pod completes Succeeded (idempotent — finds existing role, ALTERs to same password = no-op) - [ ] Pod logs show `==> admin_app role provisioned successfully` - [ ] Re-apply path verified (rotation simulation) ## Review Checklist - [x] DO block removed; psql conditional preserves CREATE-or-ALTER idempotency - [x] No change to GRANT statements, ALTER DEFAULT PRIVILEGES, or env var handling - [x] No new variables or pillar entries - [x] Closes #319 - [x] Story+arch trace (`story:admin-row-crud`, `arch:postgres`) ## Related Notes - Caused by: PR #304 (design conflict between psql vars and DO blocks) - Sibling fixes during this bootstrap apply: #316 (label `/`), #318 (DO $$ HCL escape) - Lesson: 4th post-merge runtime gap of this apply attempt (after salt master crash, label `/`, DO HCL escape). All would have been caught by pre-merge `tofu apply` to a real cluster. Reinforces `feedback_tofu_validate_not_k8s_api`. - Manual SQL was run on basketball-api postgres BEFORE this PR merged to unblock the deploy chain. Job re-run post-merge will be idempotent (role exists, ALTER same password = no-op). Closes #319
fix(database): use psql conditional instead of DO block (#319)
Some checks failed
ci/woodpecker/push/woodpecker Pipeline was successful
ci/woodpecker/pr/woodpecker Pipeline failed
ci/woodpecker/pull_request_closed/woodpecker Pipeline was successful
fe701ad6c5
psql's :'var' substitution does NOT work inside DO $$...$$ blocks
— psql treats dollar-quoted bodies as single literal tokens and
skips substitution. Postgres receives literal :'admin_pw' and
errors with syntax error.

Replace the DO BEGIN IF NOT EXISTS ELSE END procedural block with
psql's client-side \gset + \if conditional. Same idempotent
semantics, no dollar-quoting needed, :'var' substitution works
natively in regular SQL.

Closes #319

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Author
Contributor

PR #320 Review

DOMAIN REVIEW

Stack: Terraform + kubernetes_job_v1 + psql heredoc (Bash + SQL).

The fix correctly identifies the root cause: psql's :'var' substitution is suppressed inside dollar-quoted bodies because psql treats $body$...$body$ as a single literal token. Switching to client-side \gset + \if/\else/\endif moves the conditional out of the server-side procedural block and back into the psql client where variable substitution is performed pre-send. Same idempotent CREATE-or-ALTER semantics, no behavior change to GRANTs.

Specifics verified:

  • DO block fully removed (lines 192-202); no $body$, BEGIN, END IF, or EXECUTE format(...) remnants.
  • \gset correctly captures role_exists from the SELECT EXISTS expression.
  • \if :role_exists / \else / \endif matches psql conditional syntax. Leading whitespace before backslash meta-commands is tolerated in modern psql.
  • :'admin_pw' now appears only in ordinary SQL (ALTER/CREATE ROLE) where psql substitutes pre-flight — the original bug path.
  • Heredoc remains <<'SQL' (single-quoted) — shell variables are still passed via -v admin_pw="$${ADMIN_APP_PASSWORD}" rather than expanded inline. Correct.
  • GRANT USAGE / GRANT SELECT,INSERT,UPDATE,DELETE / ALTER DEFAULT PRIVILEGES untouched.
  • No new env vars, pillar entries, or variable additions (+9/-9, single file).
  • ON_ERROR_STOP=1 preserved — failed conditional or grant still aborts the Job.

BLOCKERS

None.

NITS

  • The PR body's Test Plan still has 3 unchecked boxes (post-merge tofu apply, pod log assertion, rotation re-apply). These are expected post-merge validation steps — fine to leave as the validation gate, but worth tracking on the board to ensure they get checked once Ava re-runs apply.
  • Manual SQL was run on the cluster pre-merge to unblock the deploy chain. Idempotency claim ("role exists, ALTER same password = no-op") is correct, but the post-merge Job run is the real proof. Validation must confirm pod Succeeded + logs show admin_app role provisioned successfully.

SOP COMPLIANCE

  • Branch named 319-fix-psql-var-substitution (issue-prefixed, kebab-case)
  • PR body has Summary, Why, Changes, Test Plan, Review Checklist, Related
  • Closes #319 present
  • Story + arch trace present (story:admin-row-crud, arch:postgres)
  • No secrets, no .env files, no scope creep
  • Single-file surgical change (terraform/modules/database/main.tf only)
  • Conventional commit prefix (fix(database):)

PROCESS OBSERVATIONS

This is the 4th sequential runtime fix in one bootstrap apply chain (salt master crash → label / → DO $$ HCL escape → psql var substitution). Each one is a real bug, but the pattern reinforces feedback_tofu_validate_not_k8s_api: tofu validate + tofu plan cannot catch runtime SQL/psql/k8s-API behavior. A pre-merge tofu apply against a real cluster (or ephemeral test cluster) would have caught all four. Worth surfacing as a discovered scope item: pre-merge apply gate for database-module changes, since the failure mode is "merge → 5min discovery → revert/forward-fix loop." Change-failure-rate signal for DORA.

Cycle-time impact is small (single-line conceptual fix, +9/-9), but the cumulative tax across 4 PRs in one chain is non-trivial. Not blocking this PR — flagging for the Epilogue/discovered-scope ledger.

VERDICT: APPROVED

## PR #320 Review ### DOMAIN REVIEW **Stack:** Terraform + kubernetes_job_v1 + psql heredoc (Bash + SQL). The fix correctly identifies the root cause: psql's `:'var'` substitution is suppressed inside dollar-quoted bodies because psql treats `$body$...$body$` as a single literal token. Switching to client-side `\gset` + `\if`/`\else`/`\endif` moves the conditional out of the server-side procedural block and back into the psql client where variable substitution is performed pre-send. Same idempotent CREATE-or-ALTER semantics, no behavior change to GRANTs. Specifics verified: - DO block fully removed (lines 192-202); no `$body$`, `BEGIN`, `END IF`, or `EXECUTE format(...)` remnants. - `\gset` correctly captures `role_exists` from the SELECT EXISTS expression. - `\if :role_exists` / `\else` / `\endif` matches psql conditional syntax. Leading whitespace before backslash meta-commands is tolerated in modern psql. - `:'admin_pw'` now appears only in ordinary SQL (ALTER/CREATE ROLE) where psql substitutes pre-flight — the original bug path. - Heredoc remains `<<'SQL'` (single-quoted) — shell variables are still passed via `-v admin_pw="$${ADMIN_APP_PASSWORD}"` rather than expanded inline. Correct. - GRANT USAGE / GRANT SELECT,INSERT,UPDATE,DELETE / ALTER DEFAULT PRIVILEGES untouched. - No new env vars, pillar entries, or variable additions (+9/-9, single file). - `ON_ERROR_STOP=1` preserved — failed conditional or grant still aborts the Job. ### BLOCKERS None. ### NITS - The PR body's Test Plan still has 3 unchecked boxes (post-merge tofu apply, pod log assertion, rotation re-apply). These are expected post-merge validation steps — fine to leave as the validation gate, but worth tracking on the board to ensure they get checked once Ava re-runs apply. - Manual SQL was run on the cluster pre-merge to unblock the deploy chain. Idempotency claim ("role exists, ALTER same password = no-op") is correct, but the post-merge Job run is the real proof. Validation must confirm pod Succeeded + logs show `admin_app role provisioned successfully`. ### SOP COMPLIANCE - [x] Branch named `319-fix-psql-var-substitution` (issue-prefixed, kebab-case) - [x] PR body has Summary, Why, Changes, Test Plan, Review Checklist, Related - [x] `Closes #319` present - [x] Story + arch trace present (`story:admin-row-crud`, `arch:postgres`) - [x] No secrets, no .env files, no scope creep - [x] Single-file surgical change (terraform/modules/database/main.tf only) - [x] Conventional commit prefix (`fix(database):`) ### PROCESS OBSERVATIONS This is the 4th sequential runtime fix in one bootstrap apply chain (salt master crash → label `/` → DO `$$` HCL escape → psql var substitution). Each one is a real bug, but the pattern reinforces `feedback_tofu_validate_not_k8s_api`: `tofu validate` + `tofu plan` cannot catch runtime SQL/psql/k8s-API behavior. A pre-merge `tofu apply` against a real cluster (or ephemeral test cluster) would have caught all four. Worth surfacing as a discovered scope item: pre-merge apply gate for database-module changes, since the failure mode is "merge → 5min discovery → revert/forward-fix loop." Change-failure-rate signal for DORA. Cycle-time impact is small (single-line conceptual fix, +9/-9), but the cumulative tax across 4 PRs in one chain is non-trivial. Not blocking this PR — flagging for the Epilogue/discovered-scope ledger. ### VERDICT: APPROVED
forgejo_admin deleted branch 319-fix-psql-var-substitution 2026-04-30 11:55:07 +00:00
Sign in to join this conversation.
No description provided.