[BUG] Cookie SSR auth loops infinitely — session cookie value (4719B) exceeds browser per-cookie limit #24

Closed
opened 2026-05-03 20:52:56 +00:00 by forgejo_admin · 1 comment

Type

Bug

Lineage

Discovered during post-merge end-to-end validation of forgejo_admin/westside-admin#2's 4-way decomposition (PRs #18 / #20 / #21 / #23, all merged 2026-05-03). Regression from the original ACs of #16 (/auth/callback sets the session cookie) and #15 (hook reads it on the next request) — both PRs implemented the spec correctly per static review, but the runtime cookie value exceeds the browser per-cookie size limit, so the loop never terminates.

Repo

forgejo_admin/westside-admin

What Broke

The full SSO round-trip for an admin user produces an infinite westside-admin//auth/login ↔ Keycloak ↔ /auth/callback redirect loop, ending in chrome-error://chromewebdata/ ERR_TOO_MANY_REDIRECTS after Chromium's redirect cap (~20 hops).

The session cookie that /auth/callback mints is 4719 bytes (URL-safe base64 of AES-GCM-encrypted {access_token, refresh_token, id_token, exp, refresh_exp}). RFC 6265 §6.1 mandates a 4096-byte per-cookie minimum and most browsers (including Chromium) cap there. Result: the browser silently drops the Set-Cookie: westside_admin_session=... header on the callback's 302 response. The next request to / arrives with no session cookie, the hook redirects back to /auth/login, Keycloak SSO-skips the realm session and returns a fresh code, callback sets a still-too-big cookie, browser drops it again, infinite loop.

Repro Steps

  1. Log in as a westside-basketball realm user with the admin role at https://westside-admin.tail5b443a.ts.net.
  2. Observe redirect loop ending in ERR_TOO_MANY_REDIRECTS.
  3. Inspect the callback response with curl/Python (Playwright filters Set-Cookie from CDP-derived response-headers display, so it must be done outside Playwright):
    GET https://westside-admin.tail5b443a.ts.net/auth/callback?... → 302
    Set-Cookie: westside_admin_session=<4719-byte URL-safe-base64 value>; Max-Age=...; Path=/; HttpOnly; Secure; SameSite=Lax
    
  4. Compute len(cookie_value) → 4719 bytes → over RFC 6265 §6.1 limit.

Reproducible 100% — every realm user with a typical Keycloak token set hits the same ceiling.

Expected Behavior

Per #15 AC: "Request with valid session cookie + admin role passes through, populates event.locals.user." That requires the browser to actually retain the cookie set on the previous response. The cookie value must therefore stay under 4096 bytes.

Environment

  • Cluster: westside-admin namespace, archbox k3s
  • Service commit (cluster image SHA): 50731870747636a4ed9e0ea9daa6cc9706342cf5 (matches Forgejo main HEAD post-#23 merge)
  • ArgoCD: Synced + Healthy
  • Pod: westside-admin-575d5d55bf-rfs6z (1/1 Running)
  • Token sizes (live, measured 2026-05-03 against Keycloak westside-basketball realm):
    • access_token: 801 B
    • refresh_token: 668 B
    • id_token: 1118 B
    • JSON wrap + AES-GCM IV (12B) + tag (16B) + base64url (~4/3): final cookie ≈ 4719 B

Proposed Fix

Drop id_token from the session payload in src/routes/auth/callback/+server.ts and src/lib/server/keycloak.ts's KeycloakTokens/refresh path. id_token is currently consumed only by /auth/logout to populate id_token_hint on the Keycloak /logout redirect — id_token_hint is optional per OIDC spec; without it Keycloak still performs realm-session logout, just without per-client SLO targeting. Acceptable trade-off for westside-admin v1.

After dropping id_token:

  • access_token (801) + refresh_token (668) + 2 ints + JSON wrap → ≈ 1700 B plaintext
  • AES-GCM + base64url → ≈ 2300 B cookie value
  • Comfortably under the 4096-byte ceiling with headroom for Keycloak token growth.

The frozen-lib constraint that forced the original design no longer applies — #22 was already opened today to extend KeycloakTokens with refresh_exp; this fix can ride the same migration window.

Acceptance Criteria

  • After login, the westside_admin_session cookie is < 4096 bytes (verified via the Python repro in Repro Steps).
  • Browser accepts the cookie; subsequent request to / carries it; hook validates it and resolves the request.
  • Admin user lands at / with event.locals.user populated.
  • Logout still works (clears session cookie + 302s to Keycloak /logout); id_token_hint parameter may be omitted.
  • No regression in /auth/callback failure paths (state mismatch → 400, Keycloak token failure → 502, etc.).
  • #22's refresh_exp plumbing not regressed — the field stays in the payload.
  • No tokens / cookie ciphertext logged (grep verified).
  • project-westside-admin
  • forgejo_admin/westside-admin#2 — original parent (closed but the regression is in its consumers)
  • forgejo_admin/westside-admin#15 — hook (consumes the cookie)
  • forgejo_admin/westside-admin#16 — callback (mints the cookie)
  • forgejo_admin/westside-admin#22 — open follow-up that already touches the same payload (consider bundling, but per feedback_smaller_scopes_parallel keep them separate)
  • feedback_validate_before_done — this validation gate is what caught the bug
  • feedback_funnel_requires_auth — without this fix, the funnel-auth gate is non-functional
### Type Bug ### Lineage Discovered during post-merge end-to-end validation of `forgejo_admin/westside-admin#2`'s 4-way decomposition (PRs #18 / #20 / #21 / #23, all merged 2026-05-03). Regression from the original ACs of `#16` (`/auth/callback` sets the session cookie) and `#15` (hook reads it on the next request) — both PRs implemented the spec correctly per static review, but the runtime cookie value exceeds the browser per-cookie size limit, so the loop never terminates. ### Repo `forgejo_admin/westside-admin` ### What Broke The full SSO round-trip for an admin user produces an infinite `westside-admin/` ↔ `/auth/login` ↔ Keycloak ↔ `/auth/callback` redirect loop, ending in `chrome-error://chromewebdata/ ERR_TOO_MANY_REDIRECTS` after Chromium's redirect cap (~20 hops). The session cookie that `/auth/callback` mints is **4719 bytes** (URL-safe base64 of AES-GCM-encrypted `{access_token, refresh_token, id_token, exp, refresh_exp}`). RFC 6265 §6.1 mandates a 4096-byte per-cookie minimum and most browsers (including Chromium) cap there. Result: the browser silently drops the `Set-Cookie: westside_admin_session=...` header on the callback's 302 response. The next request to `/` arrives with no session cookie, the hook redirects back to `/auth/login`, Keycloak SSO-skips the realm session and returns a fresh code, callback sets a still-too-big cookie, browser drops it again, infinite loop. ### Repro Steps 1. Log in as a westside-basketball realm user with the `admin` role at `https://westside-admin.tail5b443a.ts.net`. 2. Observe redirect loop ending in `ERR_TOO_MANY_REDIRECTS`. 3. Inspect the callback response with curl/Python (Playwright filters `Set-Cookie` from CDP-derived response-headers display, so it must be done outside Playwright): ``` GET https://westside-admin.tail5b443a.ts.net/auth/callback?... → 302 Set-Cookie: westside_admin_session=<4719-byte URL-safe-base64 value>; Max-Age=...; Path=/; HttpOnly; Secure; SameSite=Lax ``` 4. Compute `len(cookie_value)` → 4719 bytes → over RFC 6265 §6.1 limit. Reproducible 100% — every realm user with a typical Keycloak token set hits the same ceiling. ### Expected Behavior Per `#15` AC: "Request with valid session cookie + admin role passes through, populates `event.locals.user`." That requires the browser to actually retain the cookie set on the previous response. The cookie value must therefore stay under 4096 bytes. ### Environment - Cluster: `westside-admin` namespace, `archbox` k3s - Service commit (cluster image SHA): `50731870747636a4ed9e0ea9daa6cc9706342cf5` (matches Forgejo `main` HEAD post-#23 merge) - ArgoCD: Synced + Healthy - Pod: `westside-admin-575d5d55bf-rfs6z` (1/1 Running) - Token sizes (live, measured 2026-05-03 against Keycloak `westside-basketball` realm): - `access_token`: 801 B - `refresh_token`: 668 B - `id_token`: 1118 B - JSON wrap + AES-GCM IV (12B) + tag (16B) + base64url (~4/3): final cookie ≈ **4719 B** ### Proposed Fix Drop `id_token` from the session payload in `src/routes/auth/callback/+server.ts` and `src/lib/server/keycloak.ts`'s `KeycloakTokens`/refresh path. `id_token` is currently consumed only by `/auth/logout` to populate `id_token_hint` on the Keycloak `/logout` redirect — `id_token_hint` is **optional per OIDC spec**; without it Keycloak still performs realm-session logout, just without per-client SLO targeting. Acceptable trade-off for `westside-admin` v1. After dropping `id_token`: - `access_token` (801) + `refresh_token` (668) + 2 ints + JSON wrap → ≈ 1700 B plaintext - AES-GCM + base64url → ≈ 2300 B cookie value - Comfortably under the 4096-byte ceiling with headroom for Keycloak token growth. The frozen-lib constraint that forced the original design no longer applies — `#22` was already opened today to extend `KeycloakTokens` with `refresh_exp`; this fix can ride the same migration window. ### Acceptance Criteria - [ ] After login, the `westside_admin_session` cookie is < 4096 bytes (verified via the Python repro in Repro Steps). - [ ] Browser accepts the cookie; subsequent request to `/` carries it; hook validates it and resolves the request. - [ ] Admin user lands at `/` with `event.locals.user` populated. - [ ] Logout still works (clears session cookie + 302s to Keycloak `/logout`); `id_token_hint` parameter may be omitted. - [ ] No regression in `/auth/callback` failure paths (state mismatch → 400, Keycloak token failure → 502, etc.). - [ ] `#22`'s `refresh_exp` plumbing not regressed — the field stays in the payload. - [ ] No tokens / cookie ciphertext logged (grep verified). ### Related - `project-westside-admin` - `forgejo_admin/westside-admin#2` — original parent (closed but the regression is in its consumers) - `forgejo_admin/westside-admin#15` — hook (consumes the cookie) - `forgejo_admin/westside-admin#16` — callback (mints the cookie) - `forgejo_admin/westside-admin#22` — open follow-up that already touches the same payload (consider bundling, but per `feedback_smaller_scopes_parallel` keep them separate) - `feedback_validate_before_done` — this validation gate is what caught the bug - `feedback_funnel_requires_auth` — without this fix, the funnel-auth gate is non-functional
Author
Owner

Scope Review: APPROVED

Review note: review-1140-2026-05-03

Bug template fully populated with first-rate forensics (live-measured token sizes, RFC 6265 §6.1 citation, cluster image SHA 50731870747636a4ed9e0ea9daa6cc9706342cf5, exact failure topology). File targets verified at HEAD b222400. Proposed fix is correct (id_token_hint is OPTIONAL per OIDC spec) and the math checks out (4719 B → ~2300 B after dropping id_token).

Two [BODY] notes for the dev agent (not blocking, but worth recording before in_progress):

  1. Logout shape-check relaxation required. isSessionBlob in src/routes/auth/logout/+server.ts line 42 currently requires id_token === null || typeof id_token === 'string'. After dropping id_token from the cookie payload, this shape-check will reject every new session as malformed. Relax it (or drop the id_token_hint branch entirely) in the same PR. AC4 ("Logout still works") will catch this if missed — but flagging here saves a re-review iteration.

  2. Coordinate with #22 (open, untouched). Both tickets edit src/lib/server/keycloak.ts and src/hooks.server.ts. Files overlap but interface/field changes do NOT collide (#22 adds refresh_exp to KeycloakTokens; #24 changes the cookie payload shape in callback). Whichever lands first, the second rebases. No semantic conflict.

Traceability:

  • story:admin-row-crud verified — story-westside-admin-admin-row-crud exists in project page User Stories section
  • arch:westside-admin verified — satisfied by the three project-specific arch notes (arch-domain-, arch-dataflow-, arch-deployment-westside-admin)
  • arch:keycloak — backing note arch-keycloak MISSING. Known platform gap (flagged in 9+ prior reviews). Treat as WAIVER per established precedent — NOT blocking this ticket.

Decomposition: Not needed. 2 files modified, 7 testable AC, single repo. Single dev agent, single PR.

Cleared for todo. Hook will accept the APPROVED verdict.

## Scope Review: APPROVED Review note: `review-1140-2026-05-03` Bug template fully populated with first-rate forensics (live-measured token sizes, RFC 6265 §6.1 citation, cluster image SHA `50731870747636a4ed9e0ea9daa6cc9706342cf5`, exact failure topology). File targets verified at HEAD `b222400`. Proposed fix is correct (`id_token_hint` is OPTIONAL per OIDC spec) and the math checks out (4719 B → ~2300 B after dropping `id_token`). **Two `[BODY]` notes for the dev agent (not blocking, but worth recording before in_progress):** 1. **Logout shape-check relaxation required.** `isSessionBlob` in `src/routes/auth/logout/+server.ts` line 42 currently requires `id_token === null || typeof id_token === 'string'`. After dropping `id_token` from the cookie payload, this shape-check will reject every new session as malformed. Relax it (or drop the `id_token_hint` branch entirely) in the same PR. AC4 ("Logout still works") will catch this if missed — but flagging here saves a re-review iteration. 2. **Coordinate with `#22`** (open, untouched). Both tickets edit `src/lib/server/keycloak.ts` and `src/hooks.server.ts`. Files overlap but interface/field changes do NOT collide (#22 adds `refresh_exp` to `KeycloakTokens`; #24 changes the cookie payload shape in callback). Whichever lands first, the second rebases. No semantic conflict. **Traceability:** - `story:admin-row-crud` verified — `story-westside-admin-admin-row-crud` exists in project page User Stories section - `arch:westside-admin` verified — satisfied by the three project-specific arch notes (`arch-domain-`, `arch-dataflow-`, `arch-deployment-westside-admin`) - `arch:keycloak` — backing note `arch-keycloak` MISSING. Known platform gap (flagged in 9+ prior reviews). Treat as WAIVER per established precedent — NOT blocking this ticket. **Decomposition:** Not needed. 2 files modified, 7 testable AC, single repo. Single dev agent, single PR. Cleared for `todo`. Hook will accept the `APPROVED` verdict.
Sign in to join this conversation.
No labels
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
forgejo_admin/westside-admin#24
No description provided.