Hetzner edge infrastructure design for custom domains (#28) #39

Closed
ldraney wants to merge 2 commits from 28-custom-domain-reverse-proxy into main
Owner

Summary

  • Replace Fly.io + Caddy reverse proxy proposal with Hetzner edge node architecture that extends the k3s cluster to the public internet
  • Platform-level doc covers all three custom domains: palinks.app, landscaping-assistant.app, westsidekingsandqueens.com
  • Palinks-specific doc rewritten from Cloudflare Tunnel spike to cover DNS records, IngressRoute, Rails config, and Keycloak changes

Changes

  • docs/reverse-proxy.md: Deleted (Fly.io + Caddy approach superseded)
  • docs/edge-infrastructure.md: New platform-level design doc — Hetzner VPSes as k3s agent nodes, Traefik ingress, Let's Encrypt TLS, Salt-managed, Terraform-provisioned
  • docs/custom-domain.md: Rewritten — palinks.app-specific DNS, IngressRoute manifest, Rails config.hosts, Keycloak redirect URIs
  • README.md: Updated docs index with new entries

Test Plan

  • Review edge-infrastructure.md for architectural accuracy against pal-e-platform Salt/Terraform
  • Review custom-domain.md for completeness of palinks-specific changes
  • Verify README docs index matches docs/ directory contents
  • Confirm open questions are captured for follow-up planning

Review Checklist

  • Passed automated review-fix loop
  • No secrets committed
  • No unnecessary file changes
  • Commit messages are descriptive
  • Feature flag needed? No — documentation only, no code changes
  • Closes #28
  • palinks — the project this work belongs to
## Summary - Replace Fly.io + Caddy reverse proxy proposal with Hetzner edge node architecture that extends the k3s cluster to the public internet - Platform-level doc covers all three custom domains: palinks.app, landscaping-assistant.app, westsidekingsandqueens.com - Palinks-specific doc rewritten from Cloudflare Tunnel spike to cover DNS records, IngressRoute, Rails config, and Keycloak changes ## Changes - `docs/reverse-proxy.md`: Deleted (Fly.io + Caddy approach superseded) - `docs/edge-infrastructure.md`: New platform-level design doc — Hetzner VPSes as k3s agent nodes, Traefik ingress, Let's Encrypt TLS, Salt-managed, Terraform-provisioned - `docs/custom-domain.md`: Rewritten — palinks.app-specific DNS, IngressRoute manifest, Rails `config.hosts`, Keycloak redirect URIs - `README.md`: Updated docs index with new entries ## Test Plan - [ ] Review edge-infrastructure.md for architectural accuracy against pal-e-platform Salt/Terraform - [ ] Review custom-domain.md for completeness of palinks-specific changes - [ ] Verify README docs index matches docs/ directory contents - [ ] Confirm open questions are captured for follow-up planning ## Review Checklist - [ ] Passed automated review-fix loop - [ ] No secrets committed - [ ] No unnecessary file changes - [ ] Commit messages are descriptive - [ ] Feature flag needed? No — documentation only, no code changes ## Related Notes - Closes #28 - `palinks` — the project this work belongs to
Documents the Fly.io + Caddy reverse proxy approach for serving
palinks.app as the canonical URL. Caddy holds the Let's Encrypt cert,
forwards to the cluster over Tailscale. Cluster stays outbound-only.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The Fly.io + Caddy approach from the original doc is replaced by extending
the k3s cluster with Hetzner VPS edge nodes. This uses the platform's
existing Salt and Terraform tooling rather than introducing external
infrastructure outside the cluster.

- edge-infrastructure.md: platform-level doc covering all three custom
  domains (palinks.app, landscaping-assistant.app, westsidekingsandqueens.com)
- custom-domain.md: rewritten as palinks-specific DNS/TLS/routing details
- README: updated docs index with new entries

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Author
Owner

PR #39 Review

DOMAIN REVIEW

Tech stack: Documentation only (Markdown). No application code changed. Domain expertise applied: Kubernetes/Traefik ingress architecture, Terraform/Salt IaC patterns, DNS configuration, TLS/ACME workflows.

edge-infrastructure.md -- Well-structured platform-level design doc. The architecture is sound: Hetzner VPSes as k3s agent nodes joining the homelab cluster over Tailscale, with Traefik handling public ingress and Let's Encrypt TLS. Key observations:

  1. The mermaid diagrams clearly show both the topology and the traffic flow sequence. Good use of two complementary diagram types (topology + sequence).
  2. The "What Changes Where" table correctly scopes changes across pal-e-platform, pal-e-deployments, and each app repo. The "No changes to" list is valuable for confirming blast radius.
  3. The "Why Not Other Approaches" table provides clear rationale for rejecting Cloudflare Tunnel, Fly.io + Caddy, GoDaddy redirect, and separate cluster alternatives.
  4. Open questions section captures genuine unknowns (node count, region, .ts.net redirect policy, monitoring strategy, archbox Traefik cleanup).

custom-domain.md -- Clean rewrite from spike format to actionable design doc. The current-state/target-state tables, DNS change matrix, IngressRoute manifest, Rails config snippet, and Keycloak changes are all concrete and implementable.

Technical accuracy notes:

  • The IngressRoute www-redirect middleware regex looks correct for stripping www. prefix.
  • ACME HTTP-01 challenge flow description is accurate -- works because A record points to Hetzner LB which routes to Traefik.
  • The claim that "agents don't run etcd" is correct -- only server nodes run etcd in k3s, so 2 agent nodes suffice for HA without etcd quorum concerns.
  • The Tailscale ACL mention (tag:edge grant for Hetzner nodes to reach tag:k8s) is a good callout for the implementation phase.

BLOCKERS

None. This is a documentation-only PR with no application code, no secrets, no credentials, and no security-sensitive changes.

NITS

  1. PR body claims deletion of reverse-proxy.md but it does not exist. The PR body states "docs/reverse-proxy.md: Deleted (Fly.io + Caddy approach superseded)" but the diff contains no such file deletion, and the file does not exist on main. Either it was never committed, or it was removed in a prior PR. The PR body should be corrected to avoid confusion.

  2. Spike History references. custom-domain.md states the Cloudflare Tunnel approach was "original spike, issue #25" -- but #25 is a closed PR ("Spike: Custom domain routing for palinks.app"), not an issue. And the Fly.io + Caddy approach is attributed to "reverse-proxy.md, issue #28" -- but #28 is "Configure GoDaddy 301 redirect for palinks.app", not about Fly.io + Caddy. These cross-references are misleading and should be corrected.

  3. README link description updated. The Custom Domain entry changes from "routing palinks.app to production" to "palinks.app DNS, TLS, and routing" which accurately reflects the new content. However, the new Edge Infrastructure entry is placed above all other docs. Consider whether alphabetical or logical ordering is preferred -- currently the two new entries sit at the top followed by the rest in their original order.

  4. Inter-node traffic encryption. The sequence diagram note says "Inter-node traffic encrypted by Flannel/WireGuard CNI." k3s uses Flannel by default, and WireGuard encryption for Flannel is an opt-in feature (--flannel-backend=wireguard-native). If this is not currently enabled, the claim may be inaccurate. Worth verifying during implementation or noting as an assumption.

  5. Traefik cert storage. The TLS section does not mention where Traefik stores ACME certificates. With multiple edge nodes, each running Traefik independently, there is a risk of each node requesting its own cert (hitting Let's Encrypt rate limits) or certs not being shared. This is an implementation detail but worth noting in the Open Questions section.

SOP COMPLIANCE

  • Branch named after issue: 28-custom-domain-reverse-proxy references issue #28
  • PR body follows template: Summary, Changes, Test Plan, Review Checklist, Related Notes all present
  • Related section references parent issue: "Closes #28"
  • Related section references plan slug: No plan slug provided (caller confirmed "No plan slug")
  • No secrets committed: Confirmed -- no credentials, tokens, or .env files in diff
  • No unnecessary file changes: All 3 changed files are relevant to the design doc update
  • Commit messages: Not visible in diff, cannot verify

Scope observation: Issue #28 is titled "Configure GoDaddy 301 redirect for palinks.app" but this PR replaces that entire approach with a Hetzner edge architecture. The PR is a legitimate strategic pivot, but the parent issue title no longer matches the work being done. Consider updating issue #28's title to reflect the new direction, or closing #28 as superseded and creating a new issue for the Hetzner edge work.

PROCESS OBSERVATIONS

  • Documentation maturity: This PR demonstrates good documentation-driven development. The design docs are written before implementation, which reduces change failure risk.
  • Architecture evolution: The progression from GoDaddy redirect -> Cloudflare Tunnel -> Fly.io + Caddy -> Hetzner edge nodes shows healthy iteration. The final approach (extending the existing cluster) is the most operationally aligned choice -- same Salt, same Terraform, same CI/CD.
  • Follow-up planning: The edge-infrastructure.md Open Questions section creates natural follow-up work items. The custom-domain.md captures concrete implementation steps (DNS changes, IngressRoute manifest, Rails config, Keycloak changes) that translate directly into tickets.

VERDICT: APPROVED

Documentation-only PR with no blockers. The architecture design is sound and well-documented. The nits above (PR body inaccuracy about reverse-proxy.md deletion, misleading issue cross-references in Spike History, Traefik cert storage gap) are worth addressing but are not blocking.

## PR #39 Review ### DOMAIN REVIEW **Tech stack:** Documentation only (Markdown). No application code changed. Domain expertise applied: Kubernetes/Traefik ingress architecture, Terraform/Salt IaC patterns, DNS configuration, TLS/ACME workflows. **edge-infrastructure.md** -- Well-structured platform-level design doc. The architecture is sound: Hetzner VPSes as k3s agent nodes joining the homelab cluster over Tailscale, with Traefik handling public ingress and Let's Encrypt TLS. Key observations: 1. The mermaid diagrams clearly show both the topology and the traffic flow sequence. Good use of two complementary diagram types (topology + sequence). 2. The "What Changes Where" table correctly scopes changes across `pal-e-platform`, `pal-e-deployments`, and each app repo. The "No changes to" list is valuable for confirming blast radius. 3. The "Why Not Other Approaches" table provides clear rationale for rejecting Cloudflare Tunnel, Fly.io + Caddy, GoDaddy redirect, and separate cluster alternatives. 4. Open questions section captures genuine unknowns (node count, region, `.ts.net` redirect policy, monitoring strategy, archbox Traefik cleanup). **custom-domain.md** -- Clean rewrite from spike format to actionable design doc. The current-state/target-state tables, DNS change matrix, IngressRoute manifest, Rails config snippet, and Keycloak changes are all concrete and implementable. **Technical accuracy notes:** - The IngressRoute `www-redirect` middleware regex looks correct for stripping `www.` prefix. - ACME HTTP-01 challenge flow description is accurate -- works because A record points to Hetzner LB which routes to Traefik. - The claim that "agents don't run etcd" is correct -- only server nodes run etcd in k3s, so 2 agent nodes suffice for HA without etcd quorum concerns. - The Tailscale ACL mention (`tag:edge` grant for Hetzner nodes to reach `tag:k8s`) is a good callout for the implementation phase. ### BLOCKERS None. This is a documentation-only PR with no application code, no secrets, no credentials, and no security-sensitive changes. ### NITS 1. **PR body claims deletion of `reverse-proxy.md` but it does not exist.** The PR body states "docs/reverse-proxy.md: Deleted (Fly.io + Caddy approach superseded)" but the diff contains no such file deletion, and the file does not exist on `main`. Either it was never committed, or it was removed in a prior PR. The PR body should be corrected to avoid confusion. 2. **Spike History references.** `custom-domain.md` states the Cloudflare Tunnel approach was "original spike, issue #25" -- but #25 is a closed PR ("Spike: Custom domain routing for palinks.app"), not an issue. And the Fly.io + Caddy approach is attributed to "reverse-proxy.md, issue #28" -- but #28 is "Configure GoDaddy 301 redirect for palinks.app", not about Fly.io + Caddy. These cross-references are misleading and should be corrected. 3. **README link description updated.** The Custom Domain entry changes from "routing palinks.app to production" to "palinks.app DNS, TLS, and routing" which accurately reflects the new content. However, the new Edge Infrastructure entry is placed above all other docs. Consider whether alphabetical or logical ordering is preferred -- currently the two new entries sit at the top followed by the rest in their original order. 4. **Inter-node traffic encryption.** The sequence diagram note says "Inter-node traffic encrypted by Flannel/WireGuard CNI." k3s uses Flannel by default, and WireGuard encryption for Flannel is an opt-in feature (`--flannel-backend=wireguard-native`). If this is not currently enabled, the claim may be inaccurate. Worth verifying during implementation or noting as an assumption. 5. **Traefik cert storage.** The TLS section does not mention where Traefik stores ACME certificates. With multiple edge nodes, each running Traefik independently, there is a risk of each node requesting its own cert (hitting Let's Encrypt rate limits) or certs not being shared. This is an implementation detail but worth noting in the Open Questions section. ### SOP COMPLIANCE - [x] Branch named after issue: `28-custom-domain-reverse-proxy` references issue #28 - [x] PR body follows template: Summary, Changes, Test Plan, Review Checklist, Related Notes all present - [x] Related section references parent issue: "Closes #28" - [ ] Related section references plan slug: No plan slug provided (caller confirmed "No plan slug") - [x] No secrets committed: Confirmed -- no credentials, tokens, or .env files in diff - [x] No unnecessary file changes: All 3 changed files are relevant to the design doc update - [x] Commit messages: Not visible in diff, cannot verify **Scope observation:** Issue #28 is titled "Configure GoDaddy 301 redirect for palinks.app" but this PR replaces that entire approach with a Hetzner edge architecture. The PR is a legitimate strategic pivot, but the parent issue title no longer matches the work being done. Consider updating issue #28's title to reflect the new direction, or closing #28 as superseded and creating a new issue for the Hetzner edge work. ### PROCESS OBSERVATIONS - **Documentation maturity:** This PR demonstrates good documentation-driven development. The design docs are written before implementation, which reduces change failure risk. - **Architecture evolution:** The progression from GoDaddy redirect -> Cloudflare Tunnel -> Fly.io + Caddy -> Hetzner edge nodes shows healthy iteration. The final approach (extending the existing cluster) is the most operationally aligned choice -- same Salt, same Terraform, same CI/CD. - **Follow-up planning:** The edge-infrastructure.md Open Questions section creates natural follow-up work items. The custom-domain.md captures concrete implementation steps (DNS changes, IngressRoute manifest, Rails config, Keycloak changes) that translate directly into tickets. ### VERDICT: APPROVED Documentation-only PR with no blockers. The architecture design is sound and well-documented. The nits above (PR body inaccuracy about reverse-proxy.md deletion, misleading issue cross-references in Spike History, Traefik cert storage gap) are worth addressing but are not blocking.
Author
Owner

PR #39 Review

DOMAIN REVIEW

Tech stack: Documentation-only PR (Markdown design docs). Domain expertise applied: Terraform/k8s/Salt/Helm infrastructure design review, Traefik ingress patterns, Hetzner cloud architecture, DNS/TLS design.

edge-infrastructure.md -- Technically sound design. The Hetzner VPS-as-k3s-agent-node pattern is well-established and the doc explains it clearly. The Mermaid diagrams are accurate. The traffic flow (Browser -> Hetzner LB -> Traefik on edge node -> k8s Service -> pod on archbox) is correct and leverages standard k8s networking since the edge nodes join the cluster. The "What Changes Where" table is specific and actionable. The "Why Not Other Approaches" section provides good rationale.

Observations on the design itself:

  • The doc correctly identifies that 2 agent nodes suffice for HA since agents don't run etcd.
  • ACME HTTP-01 challenge will work because the A record points to the LB, which routes to Traefik -- the validation path is sound.
  • Inter-node traffic encryption via Flannel/WireGuard CNI is mentioned but should be verified -- k3s uses Flannel by default, and WireGuard backend must be explicitly enabled. If the cluster currently uses the default VXLAN backend, inter-node pod traffic between archbox and Hetzner would traverse the Tailscale tunnel (encrypted) but not be encrypted at the CNI layer. This is worth clarifying in the open questions.
  • The doc says "Traefik resolves palinks.palinks.svc.cluster.local" -- the FQDN would be palinks.palinks.svc.cluster.local only if the Service is named palinks in namespace palinks. This is likely correct but should match the actual Service name in pal-e-deployments.

custom-domain.md -- Clean rewrite from the spike format to a concrete implementation plan. The DNS change table, IngressRoute manifest, Rails config, and Keycloak changes are all actionable. The "Spike History" section provides good lineage.

One concern with the IngressRoute: the www-redirect middleware uses redirectRegex which works but is less idiomatic than Traefik's redirectScheme or a dedicated StripPrefix/Headers middleware. For a simple www-to-apex redirect, a redirectRegex is fine, but the regex escaping (\\\.) should be double-checked against Traefik's CRD parsing.

README.md -- The new entries are added at the top of the docs list, which makes sense for visibility but breaks the implicit alphabetical ordering the list previously followed. The old Custom Domain entry description ("routing palinks.app to production") is replaced, and the new entry properly reflects the updated scope. The deletion of reverse-proxy.md is referenced in the PR body but NOT present in the diff -- see BLOCKERS.

BLOCKERS

  1. Issue #28 scope mismatch (process): Issue #28 is titled "Configure GoDaddy 301 redirect for palinks.app" -- that is a concrete Task created as the Phase 1 follow-up from the original spike (PR #25, issue #15). This PR delivers a new spike (Hetzner edge infrastructure design) and claims Closes #28. The deliverable (design documents) does not match the issue scope (configure a redirect). Either:

    • This PR should reference a different issue (a new spike issue for the Hetzner design), OR
    • Issue #28 should have been re-scoped/retitled before this PR was opened, OR
    • The "Closes #28" should be removed and replaced with a "Related to #28" since this design supersedes the Phase 1 approach but doesn't fulfill the Phase 1 task.
  2. PR body claims deletion of docs/reverse-proxy.md but the diff does not contain this deletion. The PR body says: "docs/reverse-proxy.md: Deleted (Fly.io + Caddy approach superseded)". However, the diff only shows 3 changed files (README.md, custom-domain.md, edge-infrastructure.md). Either reverse-proxy.md never existed on the base branch, or the deletion was missed. The current main branch has no reverse-proxy.md in docs/. If the file doesn't exist, the PR body is misleading. If it should have been deleted, the deletion is missing from the diff.

  3. No follow-up tickets listed: The spike template requires deliverables including "Follow-up tickets created or existing tickets updated with refined scope." The old custom-domain.md had a detailed "Follow-Up Tickets Needed" section with 4 concrete tickets. The new version removes this entirely. The edge-infrastructure.md has "Open Questions" but no follow-up tickets. A spike that produces only docs without scoped follow-up work is incomplete per template-issue-spike.

NITS

  1. edge-infrastructure.md line about Flannel/WireGuard: "Inter-node traffic encrypted by Flannel/WireGuard CNI" -- this should specify whether the cluster actually has WireGuard backend enabled for Flannel, or if encryption relies solely on the Tailscale tunnel. Misleading if WireGuard is not configured.

  2. custom-domain.md IngressRoute regex: The redirectRegex escaping \\\. is double-escaped. In a YAML string context with quotes, this may or may not parse correctly depending on how Traefik's CRD controller handles it. Consider testing this specific regex or using Traefik's RedirectScheme middleware instead.

  3. custom-domain.md: The "Current State" table says GoDaddy SSL is "Included, irrelevant (only works on their servers)" -- this parenthetical is helpful context. No issue, just noting the good callout.

  4. README.md ordering: The new entries break alphabetical ordering. Consider maintaining consistent ordering (alphabetical or by priority) across the docs list.

  5. Sources removed: The old custom-domain.md had a "Sources" section with 6 linked references (Tailscale docs, GitHub issues, Cloudflare guides). The new version removes all external references. The edge-infrastructure.md also has no sources. For a spike/design doc, external references add credibility and aid future readers. Consider preserving relevant sources.

SOP COMPLIANCE

  • PR body has Summary, Changes, Test Plan, Related sections
  • No secrets committed
  • Commit messages -- cannot verify from diff alone
  • PR body accuracy -- PR body claims reverse-proxy.md deletion that is not in the diff (see BLOCKER #2)
  • Scope alignment -- PR closes #28 but deliverables don't match issue #28's scope (see BLOCKER #1)
  • No unnecessary file changes beyond stated scope
  • Documentation-only PR (appropriate for spike)

PROCESS OBSERVATIONS

  • This PR represents a significant strategic pivot: from Cloudflare Tunnel (recommended in the original spike PR #25) to Hetzner edge nodes. The rationale ("owning the edge infrastructure") is sound, but the pivot should be traceable through the issue system. Currently, the lineage is muddled because the PR claims to close a task ticket (#28) that was scoped for something entirely different (GoDaddy redirect configuration).
  • The original spike (PR #25) created a clean set of phased follow-up tickets. This PR silently supersedes all of Phase 2 without creating replacement tickets. The new approach needs its own set of implementation tickets (Terraform module, Salt states, LB provisioning, DNS cutover, etc.).
  • The design quality is high -- the docs are well-structured, technically accurate, and provide clear implementation guidance. The issue is process, not content.

VERDICT: NOT APPROVED

Reason: Three blockers prevent approval:

  1. Issue #28 scope mismatch -- the PR closes a task ticket with spike deliverables
  2. PR body inaccuracy -- claims a file deletion that isn't in the diff
  3. Missing follow-up tickets -- spike template requires scoped follow-up work

To resolve: (a) Create a proper spike issue for the Hetzner edge design, or re-scope #28; (b) fix the PR body to remove the phantom reverse-proxy.md deletion claim; (c) add a "Follow-Up Tickets" section to edge-infrastructure.md listing the concrete implementation work needed.

## PR #39 Review ### DOMAIN REVIEW **Tech stack**: Documentation-only PR (Markdown design docs). Domain expertise applied: Terraform/k8s/Salt/Helm infrastructure design review, Traefik ingress patterns, Hetzner cloud architecture, DNS/TLS design. **edge-infrastructure.md** -- Technically sound design. The Hetzner VPS-as-k3s-agent-node pattern is well-established and the doc explains it clearly. The Mermaid diagrams are accurate. The traffic flow (Browser -> Hetzner LB -> Traefik on edge node -> k8s Service -> pod on archbox) is correct and leverages standard k8s networking since the edge nodes join the cluster. The "What Changes Where" table is specific and actionable. The "Why Not Other Approaches" section provides good rationale. Observations on the design itself: - The doc correctly identifies that 2 agent nodes suffice for HA since agents don't run etcd. - ACME HTTP-01 challenge will work because the A record points to the LB, which routes to Traefik -- the validation path is sound. - Inter-node traffic encryption via Flannel/WireGuard CNI is mentioned but should be verified -- k3s uses Flannel by default, and WireGuard backend must be explicitly enabled. If the cluster currently uses the default VXLAN backend, inter-node pod traffic between archbox and Hetzner would traverse the Tailscale tunnel (encrypted) but not be encrypted at the CNI layer. This is worth clarifying in the open questions. - The doc says "Traefik resolves `palinks.palinks.svc.cluster.local`" -- the FQDN would be `palinks.palinks.svc.cluster.local` only if the Service is named `palinks` in namespace `palinks`. This is likely correct but should match the actual Service name in `pal-e-deployments`. **custom-domain.md** -- Clean rewrite from the spike format to a concrete implementation plan. The DNS change table, IngressRoute manifest, Rails config, and Keycloak changes are all actionable. The "Spike History" section provides good lineage. One concern with the IngressRoute: the `www-redirect` middleware uses `redirectRegex` which works but is less idiomatic than Traefik's `redirectScheme` or a dedicated `StripPrefix`/`Headers` middleware. For a simple www-to-apex redirect, a `redirectRegex` is fine, but the regex escaping (`\\\.`) should be double-checked against Traefik's CRD parsing. **README.md** -- The new entries are added at the top of the docs list, which makes sense for visibility but breaks the implicit alphabetical ordering the list previously followed. The old `Custom Domain` entry description ("routing palinks.app to production") is replaced, and the new entry properly reflects the updated scope. The deletion of `reverse-proxy.md` is referenced in the PR body but NOT present in the diff -- see BLOCKERS. ### BLOCKERS 1. **Issue #28 scope mismatch (process)**: Issue #28 is titled "Configure GoDaddy 301 redirect for palinks.app" -- that is a concrete Task created as the Phase 1 follow-up from the original spike (PR #25, issue #15). This PR delivers a *new spike* (Hetzner edge infrastructure design) and claims `Closes #28`. The deliverable (design documents) does not match the issue scope (configure a redirect). Either: - This PR should reference a different issue (a new spike issue for the Hetzner design), OR - Issue #28 should have been re-scoped/retitled before this PR was opened, OR - The "Closes #28" should be removed and replaced with a "Related to #28" since this design supersedes the Phase 1 approach but doesn't fulfill the Phase 1 task. 2. **PR body claims deletion of `docs/reverse-proxy.md` but the diff does not contain this deletion**. The PR body says: "`docs/reverse-proxy.md`: Deleted (Fly.io + Caddy approach superseded)". However, the diff only shows 3 changed files (README.md, custom-domain.md, edge-infrastructure.md). Either `reverse-proxy.md` never existed on the base branch, or the deletion was missed. The current `main` branch has no `reverse-proxy.md` in `docs/`. If the file doesn't exist, the PR body is misleading. If it should have been deleted, the deletion is missing from the diff. 3. **No follow-up tickets listed**: The spike template requires deliverables including "Follow-up tickets created or existing tickets updated with refined scope." The old custom-domain.md had a detailed "Follow-Up Tickets Needed" section with 4 concrete tickets. The new version removes this entirely. The edge-infrastructure.md has "Open Questions" but no follow-up tickets. A spike that produces only docs without scoped follow-up work is incomplete per `template-issue-spike`. ### NITS 1. **edge-infrastructure.md line about Flannel/WireGuard**: "Inter-node traffic encrypted by Flannel/WireGuard CNI" -- this should specify whether the cluster actually has WireGuard backend enabled for Flannel, or if encryption relies solely on the Tailscale tunnel. Misleading if WireGuard is not configured. 2. **custom-domain.md IngressRoute regex**: The `redirectRegex` escaping `\\\.` is double-escaped. In a YAML string context with quotes, this may or may not parse correctly depending on how Traefik's CRD controller handles it. Consider testing this specific regex or using Traefik's `RedirectScheme` middleware instead. 3. **custom-domain.md**: The "Current State" table says GoDaddy SSL is "Included, irrelevant (only works on their servers)" -- this parenthetical is helpful context. No issue, just noting the good callout. 4. **README.md ordering**: The new entries break alphabetical ordering. Consider maintaining consistent ordering (alphabetical or by priority) across the docs list. 5. **Sources removed**: The old custom-domain.md had a "Sources" section with 6 linked references (Tailscale docs, GitHub issues, Cloudflare guides). The new version removes all external references. The edge-infrastructure.md also has no sources. For a spike/design doc, external references add credibility and aid future readers. Consider preserving relevant sources. ### SOP COMPLIANCE - [x] PR body has Summary, Changes, Test Plan, Related sections - [x] No secrets committed - [x] Commit messages -- cannot verify from diff alone - [ ] PR body accuracy -- PR body claims `reverse-proxy.md` deletion that is not in the diff (see BLOCKER #2) - [ ] Scope alignment -- PR closes #28 but deliverables don't match issue #28's scope (see BLOCKER #1) - [x] No unnecessary file changes beyond stated scope - [x] Documentation-only PR (appropriate for spike) ### PROCESS OBSERVATIONS - This PR represents a significant strategic pivot: from Cloudflare Tunnel (recommended in the original spike PR #25) to Hetzner edge nodes. The rationale ("owning the edge infrastructure") is sound, but the pivot should be traceable through the issue system. Currently, the lineage is muddled because the PR claims to close a task ticket (#28) that was scoped for something entirely different (GoDaddy redirect configuration). - The original spike (PR #25) created a clean set of phased follow-up tickets. This PR silently supersedes all of Phase 2 without creating replacement tickets. The new approach needs its own set of implementation tickets (Terraform module, Salt states, LB provisioning, DNS cutover, etc.). - The design quality is high -- the docs are well-structured, technically accurate, and provide clear implementation guidance. The issue is process, not content. ### VERDICT: NOT APPROVED **Reason**: Three blockers prevent approval: 1. Issue #28 scope mismatch -- the PR closes a task ticket with spike deliverables 2. PR body inaccuracy -- claims a file deletion that isn't in the diff 3. Missing follow-up tickets -- spike template requires scoped follow-up work **To resolve**: (a) Create a proper spike issue for the Hetzner edge design, or re-scope #28; (b) fix the PR body to remove the phantom `reverse-proxy.md` deletion claim; (c) add a "Follow-Up Tickets" section to `edge-infrastructure.md` listing the concrete implementation work needed.
ldraney closed this pull request 2026-06-12 12:16:52 +00:00

Pull request closed

Sign in to join this conversation.
No reviewers
No labels
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
ldraney/palinks!39
No description provided.