Add CORS preflight synthetic monitor for cross-origin frontend→API health #293

New issue

Open

opened 2026-04-17 05:46:51 +00:00 by forgejo_admin · 0 comments

forgejo_admin commented

2026-04-17 05:46:51 +00:00

Contributor

Type

Feature

Lineage

Discovered during QA review of pal-e-deployments#125. The CORS misconfig surfaced as user-visible "Failed to fetch" — observability didn't catch it because we have no synthetic that exercises a cross-origin preflight.

Repo

forgejo_admin/pal-e-platform

User Story

As an on-call engineer, I want an alert to fire within 5 minutes when any frontend↔API CORS preflight starts failing in production, so that user-visible "Failed to fetch" outages are surfaced by observability instead of by users.

Context

Existing blackbox-exporter probes hit endpoints directly with GET — they never exercise the browser's actual CORS preflight pattern (OPTIONS with Origin header). When the pal-e-app → pal-e-docs preflight broke (see pal-e-deployments#124), pod logs showed OPTIONS /boards 400 for hours but no alert fired. We need a synthetic that mimics the real frontend call pattern, per-pair, so drift between an app's allowed-origin config and its actual hostname is caught before it reaches users.

Per feedback_dora_thesis.md, observability + kanban are the two DORA pillars. This is observability hardening.

File Targets

~/pal-e-platform/modules/observability/blackbox.tf (or equivalent) — add CORS-preflight probe targets
~/pal-e-platform/modules/observability/prometheusrule.tf (or k8s YAML) — add alert rule for probe_success{job="cors-preflight"} == 0
~/pal-e-platform/modules/observability/alertmanager.tf (or values file) — confirm alert routes to existing channel
New convention note (slug: convention-cors-synthetic) — document the pattern so future frontend↔api pairs add it by default

Test Expectations

Probe target list explicitly enumerates each {frontend_origin → api_host} pair
Manual test: temporarily set a wrong allowed-origin on a non-prod overlay, confirm probe goes red within one scrape interval
Alert fires after >2 consecutive failures over 5 min (avoid flap)
Alert resolves automatically when probe goes green
Recording rule (optional) for dashboard panel

Constraints

Use existing kube-prometheus-stack + blackbox-exporter (already deployed in monitoring namespace) — do NOT introduce a new probe sidecar
Alert routes must use existing Alertmanager receivers — do NOT add new notification integrations in this ticket
Probe interval ≤60s; preflight is cheap
Initial pair list: pal-e-app↔pal-e-docs, westside-app↔basketball-api, mcd-tracker-app↔mcd-tracker-api. Other pairs as discovered.

Acceptance Criteria

blackbox-exporter module config supports OPTIONS with custom Origin and Access-Control-Request-* headers
Probe targets defined for the 3 initial pairs
PrometheusRule fires on probe_success == 0 with for: 5m
Alertmanager routing verified (firing → recipient)
Resolved transition verified (recovery → recipient)
Convention note convention-cors-synthetic written + linked from definition-app
Existing pairs confirmed green before merge

Checklist

Same as Acceptance Criteria above; tracked there.

Out of Scope

Real user monitoring (RUM)
Auto-discovery of new pairs (manual list for now)
Dashboard panels (separate ticket if pursued)

Environment

Cluster: prod, namespace monitoring
Existing components: kube-prometheus-stack, blackbox-exporter, Alertmanager, Loki
Tailscale-funnel hostnames involved (initial scope): pal-e-app.tail5b443a.ts.net, pal-e-docs.tail5b443a.ts.net, westsidekingsandqueens.tail5b443a.ts.net, basketball-api.tail5b443a.ts.net, mcd-tracker-app.tail5b443a.ts.net, mcd-tracker-api.tail5b443a.ts.net

forgejo_admin/pal-e-deployments#124 — concrete bug this would have caught
feedback_dora_thesis.md — observability is half of DORA
definition-app — naming convention for app/api pairs

### Type Feature ### Lineage Discovered during QA review of `pal-e-deployments#125`. The CORS misconfig surfaced as user-visible "Failed to fetch" — observability didn't catch it because we have no synthetic that exercises a cross-origin preflight. ### Repo `forgejo_admin/pal-e-platform` ### User Story As an on-call engineer, I want an alert to fire within 5 minutes when any frontend↔API CORS preflight starts failing in production, so that user-visible "Failed to fetch" outages are surfaced by observability instead of by users. ### Context Existing `blackbox-exporter` probes hit endpoints directly with GET — they never exercise the browser's actual CORS preflight pattern (`OPTIONS` with `Origin` header). When the `pal-e-app → pal-e-docs` preflight broke (see `pal-e-deployments#124`), pod logs showed `OPTIONS /boards 400` for hours but no alert fired. We need a synthetic that mimics the real frontend call pattern, per-pair, so drift between an app's allowed-origin config and its actual hostname is caught before it reaches users. Per `feedback_dora_thesis.md`, observability + kanban are the two DORA pillars. This is observability hardening. ### File Targets - `~/pal-e-platform/modules/observability/blackbox.tf` (or equivalent) — add CORS-preflight probe targets - `~/pal-e-platform/modules/observability/prometheusrule.tf` (or k8s YAML) — add alert rule for `probe_success{job="cors-preflight"} == 0` - `~/pal-e-platform/modules/observability/alertmanager.tf` (or values file) — confirm alert routes to existing channel - New convention note (slug: `convention-cors-synthetic`) — document the pattern so future frontend↔api pairs add it by default ### Test Expectations - Probe target list explicitly enumerates each `{frontend_origin → api_host}` pair - Manual test: temporarily set a wrong allowed-origin on a non-prod overlay, confirm probe goes red within one scrape interval - Alert fires after >2 consecutive failures over 5 min (avoid flap) - Alert resolves automatically when probe goes green - Recording rule (optional) for dashboard panel ### Constraints - Use existing `kube-prometheus-stack` + `blackbox-exporter` (already deployed in `monitoring` namespace) — do NOT introduce a new probe sidecar - Alert routes must use existing Alertmanager receivers — do NOT add new notification integrations in this ticket - Probe interval ≤60s; preflight is cheap - Initial pair list: `pal-e-app↔pal-e-docs`, `westside-app↔basketball-api`, `mcd-tracker-app↔mcd-tracker-api`. Other pairs as discovered. ### Acceptance Criteria - [ ] blackbox-exporter `module` config supports `OPTIONS` with custom `Origin` and `Access-Control-Request-*` headers - [ ] Probe targets defined for the 3 initial pairs - [ ] PrometheusRule fires on `probe_success == 0` with `for: 5m` - [ ] Alertmanager routing verified (firing → recipient) - [ ] Resolved transition verified (recovery → recipient) - [ ] Convention note `convention-cors-synthetic` written + linked from `definition-app` - [ ] Existing pairs confirmed green before merge ### Checklist Same as Acceptance Criteria above; tracked there. ### Out of Scope - Real user monitoring (RUM) - Auto-discovery of new pairs (manual list for now) - Dashboard panels (separate ticket if pursued) ### Environment - Cluster: prod, namespace `monitoring` - Existing components: `kube-prometheus-stack`, `blackbox-exporter`, `Alertmanager`, `Loki` - Tailscale-funnel hostnames involved (initial scope): `pal-e-app.tail5b443a.ts.net`, `pal-e-docs.tail5b443a.ts.net`, `westsidekingsandqueens.tail5b443a.ts.net`, `basketball-api.tail5b443a.ts.net`, `mcd-tracker-app.tail5b443a.ts.net`, `mcd-tracker-api.tail5b443a.ts.net` ### Related - `forgejo_admin/pal-e-deployments#124` — concrete bug this would have caught - `feedback_dora_thesis.md` — observability is half of DORA - `definition-app` — naming convention for app/api pairs