Spike: Auth smoke test, /healthz endpoint, post-deploy verification #149

Open
opened 2026-06-07 03:54:02 +00:00 by ldraney · 0 comments
Owner

Type

Spike

Lineage

Standalone — emerged from production outage root cause analysis. authenticate_user! redirected to GET /auth/keycloak since PR #134, but OmniAuth 2.x only accepts POST. CI tests passed because OmniAuth test mode accepts both. No alerting detected the 404.

Repo

ldraney/landscaping-assistant

Question

What testing, health checks, and post-deploy verification do we need so a broken auth redirect (or similar routing failure) never reaches production undetected again?

  • Auth redirect smoke test — can we write a spec that follows the unauthenticated redirect as a real browser would and catches GET-vs-POST mismatches?
  • Health endpoint — what should /healthz check (DB, cache, external deps)? How should it wire into k8s probes?
  • Post-deploy verification — should Woodpecker run a synthetic check after image push, or should this be a separate monitoring concern (uptime robot, blackbox exporter)?

Deliverables

  • docs/observability-and-smoke-tests.md — documents the three gaps, chosen approach for each, and rationale
  • Follow-up tickets created for implementation (likely 1-2 feature tickets covering the smoke test, healthz endpoint, and probe wiring)

Time-box

1 session

  • project-landscaping-assistant — project this affects
  • ldraney/landscaping-assistant #134 — PR that introduced the broken redirect
  • ldraney/landscaping-assistant #135 — PR that fixed tests but not the controller
  • Hotfix branch: fix/login-page-redirect
### Type Spike ### Lineage Standalone — emerged from production outage root cause analysis. `authenticate_user!` redirected to `GET /auth/keycloak` since PR #134, but OmniAuth 2.x only accepts POST. CI tests passed because OmniAuth test mode accepts both. No alerting detected the 404. ### Repo `ldraney/landscaping-assistant` ### Question What testing, health checks, and post-deploy verification do we need so a broken auth redirect (or similar routing failure) never reaches production undetected again? - Auth redirect smoke test — can we write a spec that follows the unauthenticated redirect as a real browser would and catches GET-vs-POST mismatches? - Health endpoint — what should `/healthz` check (DB, cache, external deps)? How should it wire into k8s probes? - Post-deploy verification — should Woodpecker run a synthetic check after image push, or should this be a separate monitoring concern (uptime robot, blackbox exporter)? ### Deliverables - [ ] `docs/observability-and-smoke-tests.md` — documents the three gaps, chosen approach for each, and rationale - [ ] Follow-up tickets created for implementation (likely 1-2 feature tickets covering the smoke test, healthz endpoint, and probe wiring) ### Time-box 1 session ### Related - `project-landscaping-assistant` — project this affects - `ldraney/landscaping-assistant #134` — PR that introduced the broken redirect - `ldraney/landscaping-assistant #135` — PR that fixed tests but not the controller - Hotfix branch: `fix/login-page-redirect`
Sign in to join this conversation.
No labels
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
ldraney/landscaping-assistant#149
No description provided.