chore: improve Caddy Salt state with pillarization and graceful reload (#440) #441

Merged
ldraney merged 1 commit from 440-caddy-salt-improvements into main 2026-06-16 00:45:24 +00:00
Owner

Summary

Three quality improvements to the Caddy Salt state introduced in PR #439, preparing the template for multi-domain support (#434). Hardcoded Tailscale hostnames are replaced with pillar-driven variables, service restart is replaced with graceful reload, and the Caddyfile template now loops over pillar data.

Stacked PR: depends on PR #439 being merged first.

Changes

  • salt/states/caddy/Caddyfile.j2 -- replaced hardcoded palinks.app site blocks with Jinja loop over pillar['caddy']['sites']; each site defines domain, proxy_target, and optional www_redirect
  • salt/states/caddy/init.sls -- replaced service.running with watch with a caddy-reload state using cmd.run / onchanges for zero-downtime config reloads; added expected pillar structure documentation in header comments; kept service.running for boot-time enablement only

Expected Pillar Structure

caddy:
  sites:
    palinks:
      domain: palinks.app
      proxy_target: palinks.tail5b443a.ts.net
      www_redirect: true
    landscaping:
      domain: landscaping-assistant.app
      proxy_target: landscaping.tail5b443a.ts.net
      www_redirect: true

Test Plan

  • Review rendered Caddyfile output against expected format for palinks.app
  • Verify Salt state syntax is valid SLS
  • After PR #439 merges and pillar data is added, run salt '*' state.apply caddy test=True to validate
  • No regressions in Caddy service lifecycle

Review Checklist

  • Passed automated review-fix loop
  • No secrets committed
  • No unnecessary file changes
  • Commit messages are descriptive
  • Feature flag needed? No -- infrastructure refactor, no user-visible change
  • ldraney/pal-e-platform #440 -- the Forgejo issue this PR implements
  • pal-e-platform -- infrastructure project
  • Depends on: PR #439 (#425)
  • Enables: #434 (landscaping-assistant.app domain)

Closes #440

## Summary Three quality improvements to the Caddy Salt state introduced in PR #439, preparing the template for multi-domain support (#434). Hardcoded Tailscale hostnames are replaced with pillar-driven variables, service restart is replaced with graceful reload, and the Caddyfile template now loops over pillar data. **Stacked PR: depends on PR #439 being merged first.** ## Changes - `salt/states/caddy/Caddyfile.j2` -- replaced hardcoded palinks.app site blocks with Jinja loop over `pillar['caddy']['sites']`; each site defines `domain`, `proxy_target`, and optional `www_redirect` - `salt/states/caddy/init.sls` -- replaced `service.running` with `watch` with a `caddy-reload` state using `cmd.run` / `onchanges` for zero-downtime config reloads; added expected pillar structure documentation in header comments; kept `service.running` for boot-time enablement only ### Expected Pillar Structure ```yaml caddy: sites: palinks: domain: palinks.app proxy_target: palinks.tail5b443a.ts.net www_redirect: true landscaping: domain: landscaping-assistant.app proxy_target: landscaping.tail5b443a.ts.net www_redirect: true ``` ## Test Plan - [ ] Review rendered Caddyfile output against expected format for palinks.app - [ ] Verify Salt state syntax is valid SLS - [ ] After PR #439 merges and pillar data is added, run `salt '*' state.apply caddy test=True` to validate - [ ] No regressions in Caddy service lifecycle ## Review Checklist - [ ] Passed automated review-fix loop - [ ] No secrets committed - [ ] No unnecessary file changes - [ ] Commit messages are descriptive - [ ] Feature flag needed? No -- infrastructure refactor, no user-visible change ## Related Notes - `ldraney/pal-e-platform #440` -- the Forgejo issue this PR implements - `pal-e-platform` -- infrastructure project - Depends on: PR #439 (#425) - Enables: #434 (landscaping-assistant.app domain) Closes #440
feat: add Caddy reverse proxy Salt state for palinks.app
All checks were successful
ci/woodpecker/push/terraform Pipeline was successful
ci/woodpecker/pr/terraform Pipeline was successful
ci/woodpecker/pull_request_closed/terraform Pipeline was successful
6c83061ed6
Establishes salt/states/caddy/ with init.sls and Caddyfile.j2 to manage
the Caddy reverse proxy on the Hetzner edge-proxy node. The Caddyfile
routes palinks.app to its Tailscale backend with TLS SNI and redirects
www.palinks.app to the apex domain.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
chore: pillarize Caddy template, use graceful reload, add Jinja loops (#440)
All checks were successful
ci/woodpecker/push/terraform Pipeline was successful
ci/woodpecker/pr/terraform Pipeline was successful
58338d6d77
Three improvements to the Caddy Salt state from PR #439:
- Extract hardcoded Tailscale hostnames into pillar-driven variables
- Replace service restart (watch) with caddy reload (onchanges) for zero-downtime
- Refactor Caddyfile.j2 to loop over pillar['caddy']['sites'] so adding
  domains requires only pillar data, not template edits

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Author
Owner

PR #441 Review

DOMAIN REVIEW

Tech stack identified: Salt (SLS states) + Jinja2 templates + Caddy web server configuration.

Salt compliance (against repo conventions):

  • State ID naming uses kebab-case-descriptor pattern (caddy-caddyfile, caddy-service, caddy-reload) -- consistent with existing states like tailscaled-service, sshd-config, k3s-unit-file.
  • file.managed block includes all required fields (name, source, template, user, group, mode) -- matches sshd-config and k3s-unit-file patterns.
  • Header comments with purpose and template reference -- matches existing .sls documentation conventions.
  • Expected pillar structure documented in header -- good practice for data-driven states.

Caddy configuration quality:

  • reverse_proxy with tls_server_name for upstream TLS validation -- correct pattern for Tailscale backends.
  • www_redirect uses redir ... permanent (301) -- appropriate for SEO and cache behavior.
  • ACME/Let's Encrypt handled implicitly by Caddy's automatic HTTPS -- no manual cert config needed, which is correct.

Salt state architecture (graceful reload):

  • PR #439 used service.running with watch: on the Caddyfile, which triggers a full service restart on config change. PR #441 replaces this with cmd.run / onchanges using caddy reload, which is a zero-downtime reload. This is a meaningful improvement -- Caddy reload validates the new config before swapping, so a bad config does not take down the service.
  • service.running is retained for boot-time enablement only (enable: True) -- correct separation of concerns.

Jinja template quality:

  • salt['pillar.get']('caddy:sites', {}) with empty dict default -- correct; prevents template render failure if pillar is missing (renders an empty Caddyfile instead of crashing).
  • site.get('www_redirect', false) -- uses .get() with default, correct for optional fields.
  • Loop over sites.items() with site_id variable -- clean iteration pattern.

BLOCKERS

1. caddy-reload runs as root with no user constraint (medium risk)

The cmd.run state for caddy reload --config /etc/caddy/Caddyfile will execute as root by default. Caddy itself runs as the caddy user on most installations. While this works (root can reload Caddy), best practice is to be explicit about the execution context. This is not a hard blocker since the edge-proxy is a single-purpose node and Salt runs as root, but worth noting.

Downgraded to NIT -- Salt states inherently run as root via the minion. This is consistent with the existing k3s and ssh states in the repo.

No blockers found. All BLOCKER criteria pass:

  • No new functionality requiring test coverage (infrastructure config, not application code)
  • No user input to validate (pillar data is operator-controlled)
  • No secrets or credentials in code
  • No DRY violations in auth/security paths

NITS

1. caddy-service has no dependency on caddy-caddyfile (minor)

In init.sls, the caddy-service state (service.running with enable: True) has no require: or watch: relationship to caddy-caddyfile. Salt evaluates states in declaration order by default, but an explicit require would make the dependency chain clearer and protect against reordering:

caddy-service:
  service.running:
    - name: caddy
    - enable: True
    - require:
      - file: caddy-caddyfile

This ensures the Caddyfile is deployed before the service is enabled, which matters on first-run (fresh node).

2. caddy-reload does not validate config before reload (minor)

caddy reload itself validates the config before applying, so this is inherently safe. However, if validation fails, the cmd.run will return a non-zero exit code and Salt will report it as a failed state. Consider adding caddy validate --config /etc/caddy/Caddyfile as a separate state with prereq: to make validation failures more visible in Salt output. This is a nice-to-have, not a requirement.

3. Jinja false vs False (cosmetic)

Line {%- if site.get('www_redirect', false) %} -- Jinja2 accepts both false and False, but Python convention (and what Salt pillar YAML parses booleans as) is False. Using False would be more consistent with the Python/Salt ecosystem. Functionally identical.

4. Missing trailing newline consideration (cosmetic)

The Caddyfile.j2 template ends with {%- endfor %} using the whitespace-stripping tag (-%). This means the rendered file may not end with a trailing newline. Most tools expect a trailing newline. Consider using {% endfor %} (without -) for the final tag, or adding an explicit newline after the block.

SOP COMPLIANCE

  • PR body has: Summary, Changes, Test Plan, Related -- all present and well-structured
  • PR body includes Review Checklist
  • Expected pillar structure documented in PR body
  • No secrets, .env files, or credentials committed
  • No unnecessary file changes (2 files, both directly related to #440)
  • Commit messages: single commit with descriptive title matching PR title
  • Stacked PR dependency on #439 clearly documented
  • Closes #440 present in PR body

PROCESS OBSERVATIONS

  • Stacked PR pattern: PR #441 stacks on #439 cleanly. Both modify the same two files (Caddyfile.j2 and init.sls), so #439 must merge first or this PR will conflict. The dependency is correctly documented.
  • Deployment frequency: This is wave 2 of the DNS custom domain chain (terraform -> caddy -> app config). Pillarization in this PR directly enables #434 (landscaping-assistant.app) without further template changes -- good forward planning.
  • Change failure risk: Low. The caddy reload pattern validates config before applying, and the pillar-driven approach means future domain additions are data-only changes (no template edits). The onchanges guard ensures reload only fires when the Caddyfile actually changes.
  • Test plan gap: Test plan item 3 ("run salt '*' state.apply caddy test=True") depends on pillar data existing. Since this PR does not add pillar data (that comes separately), the test plan should note that dry-run validation requires pillar data to be added first. This is documented implicitly ("After PR #439 merges and pillar data is added") but could be more explicit.

VERDICT: APPROVED

## PR #441 Review ### DOMAIN REVIEW **Tech stack identified:** Salt (SLS states) + Jinja2 templates + Caddy web server configuration. **Salt compliance (against repo conventions):** - State ID naming uses `kebab-case-descriptor` pattern (`caddy-caddyfile`, `caddy-service`, `caddy-reload`) -- consistent with existing states like `tailscaled-service`, `sshd-config`, `k3s-unit-file`. - `file.managed` block includes all required fields (`name`, `source`, `template`, `user`, `group`, `mode`) -- matches `sshd-config` and `k3s-unit-file` patterns. - Header comments with purpose and template reference -- matches existing `.sls` documentation conventions. - Expected pillar structure documented in header -- good practice for data-driven states. **Caddy configuration quality:** - `reverse_proxy` with `tls_server_name` for upstream TLS validation -- correct pattern for Tailscale backends. - `www_redirect` uses `redir ... permanent` (301) -- appropriate for SEO and cache behavior. - ACME/Let's Encrypt handled implicitly by Caddy's automatic HTTPS -- no manual cert config needed, which is correct. **Salt state architecture (graceful reload):** - PR #439 used `service.running` with `watch:` on the Caddyfile, which triggers a full service restart on config change. PR #441 replaces this with `cmd.run` / `onchanges` using `caddy reload`, which is a zero-downtime reload. This is a meaningful improvement -- Caddy reload validates the new config before swapping, so a bad config does not take down the service. - `service.running` is retained for boot-time enablement only (`enable: True`) -- correct separation of concerns. **Jinja template quality:** - `salt['pillar.get']('caddy:sites', {})` with empty dict default -- correct; prevents template render failure if pillar is missing (renders an empty Caddyfile instead of crashing). - `site.get('www_redirect', false)` -- uses `.get()` with default, correct for optional fields. - Loop over `sites.items()` with `site_id` variable -- clean iteration pattern. ### BLOCKERS **1. `caddy-reload` runs as root with no user constraint (medium risk)** The `cmd.run` state for `caddy reload --config /etc/caddy/Caddyfile` will execute as root by default. Caddy itself runs as the `caddy` user on most installations. While this works (root can reload Caddy), best practice is to be explicit about the execution context. This is not a hard blocker since the edge-proxy is a single-purpose node and Salt runs as root, but worth noting. **Downgraded to NIT** -- Salt states inherently run as root via the minion. This is consistent with the existing `k3s` and `ssh` states in the repo. **No blockers found.** All BLOCKER criteria pass: - No new functionality requiring test coverage (infrastructure config, not application code) - No user input to validate (pillar data is operator-controlled) - No secrets or credentials in code - No DRY violations in auth/security paths ### NITS **1. `caddy-service` has no dependency on `caddy-caddyfile` (minor)** In `init.sls`, the `caddy-service` state (`service.running` with `enable: True`) has no `require:` or `watch:` relationship to `caddy-caddyfile`. Salt evaluates states in declaration order by default, but an explicit `require` would make the dependency chain clearer and protect against reordering: ```yaml caddy-service: service.running: - name: caddy - enable: True - require: - file: caddy-caddyfile ``` This ensures the Caddyfile is deployed before the service is enabled, which matters on first-run (fresh node). **2. `caddy-reload` does not validate config before reload (minor)** `caddy reload` itself validates the config before applying, so this is inherently safe. However, if validation fails, the `cmd.run` will return a non-zero exit code and Salt will report it as a failed state. Consider adding `caddy validate --config /etc/caddy/Caddyfile` as a separate state with `prereq:` to make validation failures more visible in Salt output. This is a nice-to-have, not a requirement. **3. Jinja `false` vs `False` (cosmetic)** Line `{%- if site.get('www_redirect', false) %}` -- Jinja2 accepts both `false` and `False`, but Python convention (and what Salt pillar YAML parses booleans as) is `False`. Using `False` would be more consistent with the Python/Salt ecosystem. Functionally identical. **4. Missing trailing newline consideration (cosmetic)** The Caddyfile.j2 template ends with `{%- endfor %}` using the whitespace-stripping tag (`-%`). This means the rendered file may not end with a trailing newline. Most tools expect a trailing newline. Consider using `{% endfor %}` (without `-`) for the final tag, or adding an explicit newline after the block. ### SOP COMPLIANCE - [x] PR body has: Summary, Changes, Test Plan, Related -- all present and well-structured - [x] PR body includes Review Checklist - [x] Expected pillar structure documented in PR body - [x] No secrets, .env files, or credentials committed - [x] No unnecessary file changes (2 files, both directly related to #440) - [x] Commit messages: single commit with descriptive title matching PR title - [x] Stacked PR dependency on #439 clearly documented - [x] `Closes #440` present in PR body ### PROCESS OBSERVATIONS - **Stacked PR pattern**: PR #441 stacks on #439 cleanly. Both modify the same two files (`Caddyfile.j2` and `init.sls`), so #439 must merge first or this PR will conflict. The dependency is correctly documented. - **Deployment frequency**: This is wave 2 of the DNS custom domain chain (terraform -> caddy -> app config). Pillarization in this PR directly enables #434 (landscaping-assistant.app) without further template changes -- good forward planning. - **Change failure risk**: Low. The `caddy reload` pattern validates config before applying, and the pillar-driven approach means future domain additions are data-only changes (no template edits). The `onchanges` guard ensures reload only fires when the Caddyfile actually changes. - **Test plan gap**: Test plan item 3 ("run `salt '*' state.apply caddy test=True`") depends on pillar data existing. Since this PR does not add pillar data (that comes separately), the test plan should note that dry-run validation requires pillar data to be added first. This is documented implicitly ("After PR #439 merges and pillar data is added") but could be more explicit. ### VERDICT: APPROVED
ldraney force-pushed 440-caddy-salt-improvements from 58338d6d77
All checks were successful
ci/woodpecker/push/terraform Pipeline was successful
ci/woodpecker/pr/terraform Pipeline was successful
to 3615ffe682
All checks were successful
ci/woodpecker/push/terraform Pipeline was successful
ci/woodpecker/pr/terraform Pipeline was successful
ci/woodpecker/pull_request_closed/terraform Pipeline was successful
2026-06-16 00:41:37 +00:00
Compare
ldraney deleted branch 440-caddy-salt-improvements 2026-06-16 00:45:24 +00:00
Sign in to join this conversation.
No description provided.