Add CORS preflight synthetic monitor for cross-origin frontend→API health #293

Open
opened 2026-04-17 05:46:51 +00:00 by forgejo_admin · 0 comments
Contributor

Type

Feature

Lineage

Discovered during QA review of pal-e-deployments#125. The CORS misconfig surfaced as user-visible "Failed to fetch" — observability didn't catch it because we have no synthetic that exercises a cross-origin preflight.

Repo

forgejo_admin/pal-e-platform

User Story

As an on-call engineer, I want an alert to fire within 5 minutes when any frontend↔API CORS preflight starts failing in production, so that user-visible "Failed to fetch" outages are surfaced by observability instead of by users.

Context

Existing blackbox-exporter probes hit endpoints directly with GET — they never exercise the browser's actual CORS preflight pattern (OPTIONS with Origin header). When the pal-e-app → pal-e-docs preflight broke (see pal-e-deployments#124), pod logs showed OPTIONS /boards 400 for hours but no alert fired. We need a synthetic that mimics the real frontend call pattern, per-pair, so drift between an app's allowed-origin config and its actual hostname is caught before it reaches users.

Per feedback_dora_thesis.md, observability + kanban are the two DORA pillars. This is observability hardening.

File Targets

  • ~/pal-e-platform/modules/observability/blackbox.tf (or equivalent) — add CORS-preflight probe targets
  • ~/pal-e-platform/modules/observability/prometheusrule.tf (or k8s YAML) — add alert rule for probe_success{job="cors-preflight"} == 0
  • ~/pal-e-platform/modules/observability/alertmanager.tf (or values file) — confirm alert routes to existing channel
  • New convention note (slug: convention-cors-synthetic) — document the pattern so future frontend↔api pairs add it by default

Test Expectations

  • Probe target list explicitly enumerates each {frontend_origin → api_host} pair
  • Manual test: temporarily set a wrong allowed-origin on a non-prod overlay, confirm probe goes red within one scrape interval
  • Alert fires after >2 consecutive failures over 5 min (avoid flap)
  • Alert resolves automatically when probe goes green
  • Recording rule (optional) for dashboard panel

Constraints

  • Use existing kube-prometheus-stack + blackbox-exporter (already deployed in monitoring namespace) — do NOT introduce a new probe sidecar
  • Alert routes must use existing Alertmanager receivers — do NOT add new notification integrations in this ticket
  • Probe interval ≤60s; preflight is cheap
  • Initial pair list: pal-e-app↔pal-e-docs, westside-app↔basketball-api, mcd-tracker-app↔mcd-tracker-api. Other pairs as discovered.

Acceptance Criteria

  • blackbox-exporter module config supports OPTIONS with custom Origin and Access-Control-Request-* headers
  • Probe targets defined for the 3 initial pairs
  • PrometheusRule fires on probe_success == 0 with for: 5m
  • Alertmanager routing verified (firing → recipient)
  • Resolved transition verified (recovery → recipient)
  • Convention note convention-cors-synthetic written + linked from definition-app
  • Existing pairs confirmed green before merge

Checklist

Same as Acceptance Criteria above; tracked there.

Out of Scope

  • Real user monitoring (RUM)
  • Auto-discovery of new pairs (manual list for now)
  • Dashboard panels (separate ticket if pursued)

Environment

  • Cluster: prod, namespace monitoring
  • Existing components: kube-prometheus-stack, blackbox-exporter, Alertmanager, Loki
  • Tailscale-funnel hostnames involved (initial scope): pal-e-app.tail5b443a.ts.net, pal-e-docs.tail5b443a.ts.net, westsidekingsandqueens.tail5b443a.ts.net, basketball-api.tail5b443a.ts.net, mcd-tracker-app.tail5b443a.ts.net, mcd-tracker-api.tail5b443a.ts.net
  • forgejo_admin/pal-e-deployments#124 — concrete bug this would have caught
  • feedback_dora_thesis.md — observability is half of DORA
  • definition-app — naming convention for app/api pairs
### Type Feature ### Lineage Discovered during QA review of `pal-e-deployments#125`. The CORS misconfig surfaced as user-visible "Failed to fetch" — observability didn't catch it because we have no synthetic that exercises a cross-origin preflight. ### Repo `forgejo_admin/pal-e-platform` ### User Story As an on-call engineer, I want an alert to fire within 5 minutes when any frontend↔API CORS preflight starts failing in production, so that user-visible "Failed to fetch" outages are surfaced by observability instead of by users. ### Context Existing `blackbox-exporter` probes hit endpoints directly with GET — they never exercise the browser's actual CORS preflight pattern (`OPTIONS` with `Origin` header). When the `pal-e-app → pal-e-docs` preflight broke (see `pal-e-deployments#124`), pod logs showed `OPTIONS /boards 400` for hours but no alert fired. We need a synthetic that mimics the real frontend call pattern, per-pair, so drift between an app's allowed-origin config and its actual hostname is caught before it reaches users. Per `feedback_dora_thesis.md`, observability + kanban are the two DORA pillars. This is observability hardening. ### File Targets - `~/pal-e-platform/modules/observability/blackbox.tf` (or equivalent) — add CORS-preflight probe targets - `~/pal-e-platform/modules/observability/prometheusrule.tf` (or k8s YAML) — add alert rule for `probe_success{job="cors-preflight"} == 0` - `~/pal-e-platform/modules/observability/alertmanager.tf` (or values file) — confirm alert routes to existing channel - New convention note (slug: `convention-cors-synthetic`) — document the pattern so future frontend↔api pairs add it by default ### Test Expectations - Probe target list explicitly enumerates each `{frontend_origin → api_host}` pair - Manual test: temporarily set a wrong allowed-origin on a non-prod overlay, confirm probe goes red within one scrape interval - Alert fires after >2 consecutive failures over 5 min (avoid flap) - Alert resolves automatically when probe goes green - Recording rule (optional) for dashboard panel ### Constraints - Use existing `kube-prometheus-stack` + `blackbox-exporter` (already deployed in `monitoring` namespace) — do NOT introduce a new probe sidecar - Alert routes must use existing Alertmanager receivers — do NOT add new notification integrations in this ticket - Probe interval ≤60s; preflight is cheap - Initial pair list: `pal-e-app↔pal-e-docs`, `westside-app↔basketball-api`, `mcd-tracker-app↔mcd-tracker-api`. Other pairs as discovered. ### Acceptance Criteria - [ ] blackbox-exporter `module` config supports `OPTIONS` with custom `Origin` and `Access-Control-Request-*` headers - [ ] Probe targets defined for the 3 initial pairs - [ ] PrometheusRule fires on `probe_success == 0` with `for: 5m` - [ ] Alertmanager routing verified (firing → recipient) - [ ] Resolved transition verified (recovery → recipient) - [ ] Convention note `convention-cors-synthetic` written + linked from `definition-app` - [ ] Existing pairs confirmed green before merge ### Checklist Same as Acceptance Criteria above; tracked there. ### Out of Scope - Real user monitoring (RUM) - Auto-discovery of new pairs (manual list for now) - Dashboard panels (separate ticket if pursued) ### Environment - Cluster: prod, namespace `monitoring` - Existing components: `kube-prometheus-stack`, `blackbox-exporter`, `Alertmanager`, `Loki` - Tailscale-funnel hostnames involved (initial scope): `pal-e-app.tail5b443a.ts.net`, `pal-e-docs.tail5b443a.ts.net`, `westsidekingsandqueens.tail5b443a.ts.net`, `basketball-api.tail5b443a.ts.net`, `mcd-tracker-app.tail5b443a.ts.net`, `mcd-tracker-api.tail5b443a.ts.net` ### Related - `forgejo_admin/pal-e-deployments#124` — concrete bug this would have caught - `feedback_dora_thesis.md` — observability is half of DORA - `definition-app` — naming convention for app/api pairs
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
ldraney/pal-e-platform#293
No description provided.