Fix Keycloak blackbox probe: NetworkPolicy + internal URL (1 alert) #111

Closed
opened 2026-03-18 17:03:36 +00:00 by forgejo_admin · 2 comments

Lineage

plan-pal-e-platform → Platform Hardening

Repo

forgejo_admin/pal-e-platform

User Story

As a platform operator
I want the Keycloak probe to use the internal cluster URL with correct NetworkPolicy
So that EndpointDown reflects real outages, not probe misconfiguration

Context

Keycloak is healthy (200 on /realms/master from host). The alert is a false positive caused by two compounding issues:

  1. NetworkPolicy blocks monitoring ingress — Keycloak's NetworkPolicy only allows ingress from tailscale namespace. Missing the monitoring namespace rule every other platform service has.
  2. Probe uses external funnel URL — Blackbox probe hits https://keycloak.tail5b443a.ts.net which routes through DERP relay and times out. All other core services use internal cluster URLs.

Evidence: Pod 1/1 Running 4d2h, curl -sk .../realms/master returns 200, blackbox from monitoring pod gets EOF on external URL and connection refused on internal URL (NetworkPolicy).

File Targets

  • terraform/network-policies.tf line ~130 — add monitoring namespace to Keycloak ingress rules
  • terraform/main.tf line ~439 — change probe URL from https://keycloak.tail5b443a.ts.net to http://keycloak.keycloak.svc.cluster.local:80/realms/master

Files NOT to touch:

  • Keycloak Helm values — the service itself is fine

Acceptance Criteria

  • probe_success{service="keycloak"} == 1
  • Keycloak NetworkPolicy includes monitoring namespace
  • EndpointDown alert clears

Test Expectations

  • tofu plan -lock=false shows only the two expected changes
  • After apply: curl from blackbox pod to internal URL returns 200
  • Prometheus target for keycloak shows UP

Constraints

  • Two-line change, low risk
  • Requires tofu apply (bundle with state drift apply if timing works)
  • Match existing NetworkPolicy pattern from other services

Checklist

  • NetworkPolicy updated
  • Probe URL updated
  • PR opened
  • tofu plan clean
  • Tests pass
  • No unrelated changes
  • pal-e-platform — project board
  • Issue #109 — umbrella alert cleanup
### Lineage `plan-pal-e-platform` → Platform Hardening ### Repo `forgejo_admin/pal-e-platform` ### User Story As a platform operator I want the Keycloak probe to use the internal cluster URL with correct NetworkPolicy So that EndpointDown reflects real outages, not probe misconfiguration ### Context Keycloak is healthy (200 on `/realms/master` from host). The alert is a false positive caused by two compounding issues: 1. **NetworkPolicy blocks monitoring ingress** — Keycloak's NetworkPolicy only allows ingress from `tailscale` namespace. Missing the `monitoring` namespace rule every other platform service has. 2. **Probe uses external funnel URL** — Blackbox probe hits `https://keycloak.tail5b443a.ts.net` which routes through DERP relay and times out. All other core services use internal cluster URLs. Evidence: Pod 1/1 Running 4d2h, `curl -sk .../realms/master` returns 200, blackbox from monitoring pod gets EOF on external URL and connection refused on internal URL (NetworkPolicy). ### File Targets - `terraform/network-policies.tf` line ~130 — add `monitoring` namespace to Keycloak ingress rules - `terraform/main.tf` line ~439 — change probe URL from `https://keycloak.tail5b443a.ts.net` to `http://keycloak.keycloak.svc.cluster.local:80/realms/master` Files NOT to touch: - Keycloak Helm values — the service itself is fine ### Acceptance Criteria - [ ] `probe_success{service="keycloak"} == 1` - [ ] Keycloak NetworkPolicy includes monitoring namespace - [ ] EndpointDown alert clears ### Test Expectations - [ ] `tofu plan -lock=false` shows only the two expected changes - [ ] After apply: `curl` from blackbox pod to internal URL returns 200 - [ ] Prometheus target for keycloak shows UP ### Constraints - Two-line change, low risk - Requires `tofu apply` (bundle with state drift apply if timing works) - Match existing NetworkPolicy pattern from other services ### Checklist - [ ] NetworkPolicy updated - [ ] Probe URL updated - [ ] PR opened - [ ] `tofu plan` clean - [ ] Tests pass - [ ] No unrelated changes ### Related - `pal-e-platform` — project board - Issue #109 — umbrella alert cleanup
Author
Owner

Scope Review: READY

Review note: review-190-2026-03-18
Scope is solid — all template sections present, both file targets verified against codebase, no blast radius concerns. Keycloak is the sole outlier (external probe URL + missing monitoring NetworkPolicy rule). Two-line fix, ready for agent dispatch.

## Scope Review: READY Review note: `review-190-2026-03-18` Scope is solid — all template sections present, both file targets verified against codebase, no blast radius concerns. Keycloak is the sole outlier (external probe URL + missing monitoring NetworkPolicy rule). Two-line fix, ready for agent dispatch.
Author
Owner

Scope Review: READY

Review note: review-190-2026-03-18
All template sections present, both file targets verified against codebase. Keycloak NetworkPolicy (line 130) confirmed missing monitoring ingress; probe URL (line 440) confirmed as only platform-tier service using external URL. Internal service URL structure validated (port 80, namespace keycloak). Acceptance criteria are machine-verifiable. No blockers.

Blast radius note: Ollama NetworkPolicy also lacks monitoring ingress but has no blackbox probe — not in scope, flagged for future cleanup.

## Scope Review: READY Review note: `review-190-2026-03-18` All template sections present, both file targets verified against codebase. Keycloak NetworkPolicy (line 130) confirmed missing monitoring ingress; probe URL (line 440) confirmed as only platform-tier service using external URL. Internal service URL structure validated (port 80, namespace keycloak). Acceptance criteria are machine-verifiable. No blockers. **Blast radius note:** Ollama NetworkPolicy also lacks monitoring ingress but has no blackbox probe — not in scope, flagged for future cleanup.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
forgejo_admin/pal-e-platform#111
No description provided.