Spike: Keycloak auth integration design #23
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "16-keycloak-integration-spike"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Summary
omniauth-keycloakgem, designs four new tables (users, clicks, feature_flags, visibility column on links), documents Keycloak provisioning steps, and maps secrets wiring through pal-e-deployments.Changes
docs/auth.md: New consolidated auth design document covering OIDC gem evaluation, schema design with mermaid ER diagram, Keycloak realm/client provisioning (Terraform config), secrets wiring, controller integration sketches, and follow-up ticket breakdown.Test Plan
pal-e-services/terraform/k3s.tfvarsarchitecture.mdandvisibility.mddocsReview Checklist
Related Notes
ldraney/palinks #16— Spike: Keycloak integration and role-based schema designsop-keycloak-client-creation— Platform SOP for Keycloak realm/client managementCloses #16
Review
Scope: Docs-only spike. Single new file
docs/auth.md(393 lines). No code changes, no runtime impact.Findings
1. Minor --
feature_enabled?nil comparison (line 147)The
!= nilcheck inuser&.preferences&.dig("flags", flag_name) != nilwill treatfalseas a valid override (correct), but the Ruby idiom!user.preferences.dig(...).nil?is more conventional. Not a bug since this is a design sketch, not shipped code.2. Minor -- Logout does not revoke Keycloak session (line 357-359)
The
SessionsController#destroyonly callsreset_session, which clears the Rails session but leaves the Keycloak SSO session alive. A user who logs out and clicks "Log in" again will be silently re-authenticated without seeing the Keycloak login screen. The follow-up implementation ticket should note that RP-initiated logout (redirect to Keycloak'send_session_endpointwithpost_logout_redirect_uri) is needed. Thepost_logout_redirect_urisare already configured in the client spec, so this is a matter of wiring the redirect in the controller.3. Observation -- Consistency with existing docs
The schema in
docs/auth.mdaligns with the ER diagram indocs/architecture.md. The visibility model matchesdocs/visibility.md. The Keycloak provisioning config follows the patterns documented insop-keycloak-client-creation(realm per project, confidential client, PKCE S256, direct access grants disabled). No contradictions found.4. Observation -- All five spike questions answered
VERDICT: APPROVE
Clean spike doc. The two minor findings are design-sketch-level observations that should be addressed in the implementation tickets, not in this doc. No blockers.
PR #23 Review
DOMAIN REVIEW
Tech stack: Ruby on Rails 8.1, PostgreSQL, Keycloak OIDC, Terraform/k8s (design doc references). This is a docs-only spike -- no runtime code changes. Review focuses on design document quality, consistency with existing docs, and correctness of proposed patterns.
Gem recommendation (omniauth-keycloak): Sound. The rationale for choosing it over Devise and bare
openid_connectis well-articulated. The fallback path toomniauth_openid_connectif the gem is abandoned is a practical escape hatch. The inclusion ofomniauth-rails_csrf_protectionis correct -- OmniAuth 2.x mandates POST-only auth initiation.Schema design: The four-table design (users, clicks, feature_flags, visibility column on links) is coherent. Key design decisions are well-reasoned:
keycloak_subas stable identifier over email -- correctfeature_flagstable vs JSONB on users -- appropriate separation of concernssuperadminvisibility for new links -- correct security postureController sketches: The
SessionsController#createandApplicationControllerhelper methods are clean. The role extraction fromrealm_access.rolesis correct for Keycloak's token structure.Terraform config: PKCE S256, confidential client, disabled direct access grants -- all align with platform SOP patterns. Registration disabled is appropriate for a manually-managed user base.
Secrets wiring: Uses
secretKeyRefreferences to a SOPS-encrypted secret -- no plaintext secrets. The provisioning order (realm -> client -> network policy -> user -> secrets) is logically sequenced.Follow-up tickets: Seven concrete tickets with clear scope and dependency ordering. Good breakdown.
BLOCKERS
None. This is a docs-only spike with no runtime code changes. No secrets, credentials, or executable code is being committed.
NITS
Schema drift between docs: The
feature_flagstable indocs/auth.mdomits thedescriptioncolumn that exists indocs/architecture.md(line 118 of architecture.md:string description "human-readable explanation"). These two docs should stay in sync. The auth.md version hasname,enabled_globally,role_overrides,created_at,updated_at-- architecture.md addsdescription. Since architecture.md already exists as the canonical schema reference, auth.md should includedescriptionto avoid confusion during implementation.Flag resolution nil check: In the
feature_enabled?method, the per-user override check uses!= nil:This is intentional (distinguishing
falsefrom absent), but a comment explaining the!= nilpattern would help future readers understand it is not a mistake. Minor readability nit.Clicks table constraint: The doc states "at least one of user_id or session_id should be present (enforced at the model level, not the DB)." Consider adding a brief note about why not at the DB level -- the reasoning (keeping migrations simple) is stated but a sentence about the tradeoff (model-only validation can be bypassed by raw SQL) would strengthen the design rationale.
Local dev auth bypass: Follow-up ticket 7 mentions checking
Rails.env.development?for a mock superadmin session. The "Secrets > Local Development" section also references this pattern. Consider consolidating these into a single reference to avoid divergence when the implementation ticket is written.Missing
FEATURE_FLAGSrelationship line: The mermaid ER diagram in auth.md showsUSERS ||--o{ CLICKSandLINKS ||--o{ CLICKSbut omits theFEATURE_FLAGS }o--o{ USERSrelationship that exists indocs/architecture.md(line 126). The relationship exists conceptually (viausers.preferencesJSONB referencing flag names) and should be diagrammed for completeness.SOP COMPLIANCE
16-keycloak-integration-spikefollows{issue-number}-{kebab-case-purpose}ldraney/palinks #16Closes #16present in PR bodyENV.fetch(), secrets reference SOPS pathsdocs/auth.md), appropriate for spike scopePROCESS OBSERVATIONS
docs/architecture.mdanddocs/visibility.mdwith minor schema drift (nits above). Resolving the drift before merge prevents confusion during implementation.VERDICT: APPROVED
PR #23 Review
DOMAIN REVIEW
Tech stack: This is a docs-only spike PR (Ruby on Rails context). The deliverable is
docs/auth.md-- a design document for Keycloak OIDC integration. No runtime code changes. Review focuses on design quality, consistency with existing docs, and spike completeness.Spike coverage: The five implicit design questions from issue #16 are addressed:
omniauth-keycloakwith clear rationaleDesign quality: The document is thorough. The gem comparison table is well-structured. The schema ER diagram, flag resolution logic, controller sketches, and provisioning order are all clear and actionable. The Terraform config snippets match the patterns described in
docs/infrastructure.md.Consistency with existing docs: The schema defined in
docs/auth.mdis consistent with the schema already documented indocs/architecture.md(lines 80-127). The visibility model matchesdocs/visibility.md. The feature flag resolution order matchesdocs/feature-flags.md. No contradictions found.Design observation (non-blocking): The
extract_rolefallback in theSessionsControllersketch defaults any authenticated user without an explicit Keycloak role to"member". This is a permissive default -- if someone gets a Keycloak account but is not assigned a role, they silently get member-level access. The implementation ticket should explicitly decide whether to fail closed (reject users with no role) or fail open (default to member). For a spike doc this is fine to leave as a design note.Overlap note:
docs/auth.mdnecessarily restates some schema and visibility content that exists indocs/architecture.mdanddocs/visibility.md. This is acceptable for a spike deliverable -- the auth doc is meant to be a self-contained design record. The implementation tickets should reference auth.md as the source of truth and avoid further duplication.BLOCKERS
mergeable: false. The README.md edit (adding the Auth doc link after the Visibility line) conflicts with the Feature Flags doc link that was merged to main via PR #27. The branch must be rebased onto current main to resolve this conflict before merge. The fix is trivial (the Auth link line just needs to be placed after the Feature Flags line), but the PR cannot merge in its current state.NITS
README link ordering -- After rebase, the Auth doc link should be placed after Feature Flags to maintain the logical grouping: Architecture > Infrastructure > Pipeline > Visibility > Feature Flags > Auth > User Stories > Rails Notes. Auth depends on concepts from both Visibility and Feature Flags, so it reads better after them.
!= nilpattern in flag resolution -- The code sketch usesuser.preferences.dig("flags", flag_name) != nilwith a comment explaining the explicit nil check. This is correct Ruby but idiomatic Rails would use.key?or.has_key?to distinguish "key absent" from "key present with nil value". Since this is a sketch and not shipped code, this is purely a nit for the implementation ticket to consider.Clicks table constraint -- The doc notes that "at least one of user_id or session_id should be present" but is enforced at the model level, not DB. The stated rationale (keep migration simple) is reasonable, but the implementation ticket should consider a DB-level check constraint (
CHECK (user_id IS NOT NULL OR session_id IS NOT NULL)) since it is a one-line addition to the migration and prevents data integrity issues from raw SQL or console operations.PKCE for confidential client -- The Terraform config mandates
pkce_code_challenge_method = "S256"for a confidential client. While PKCE for confidential clients is technically unnecessary (the client secret already prevents code interception), the doc correctly notes this is per platform SOP. No action needed, just noting the design rationale is sound.Redirect URI -- The
valid_redirect_urisonly lists the production Tailscale URL. The implementation ticket should consider whether a dev redirect URI (e.g.,http://localhost:9999/auth/keycloak/callback) should be added, or whether the local dev auth bypass (ticket #7 in the follow-ups) makes this unnecessary.SOP COMPLIANCE
16-keycloak-integration-spike)ldraney/palinks #16)Closes #16present in PR bodyENV.fetch(), no actual valuesdocs/auth.md(new) andREADME.md(doc index update)PROCESS OBSERVATIONS
VERDICT: NOT APPROVED
Reason: Single blocker -- merge conflict (
mergeable: false). The README.md change conflicts with the Feature Flags doc link added by PR #27. Rebase onto main, resolve the one-line conflict in README.md, and this is ready to merge. The design content itself is thorough and well-aligned with existing docs.0d5625bc1ato0b75419538PR #23 Review
Re-review after rebase to resolve merge conflicts.
DOMAIN REVIEW
Tech stack: Ruby on Rails (docs-only spike -- no runtime code changes). The PR adds
docs/auth.md(395 lines) and a one-lineREADME.mdupdate linking to it.Document quality -- auth.md:
The design doc is thorough and well-structured. It covers five design areas with concrete, actionable recommendations:
OIDC Gem Evaluation -- Three options compared in a table with pros/cons.
omniauth-keycloakrecommendation is well-reasoned: narrow auth requirement, drop-in Rails convention, and a clear escape hatch if the gem is abandoned. The CSRF protection gem (omniauth-rails_csrf_protection) is correctly called out as mandatory for OmniAuth 2.x.Schema Design -- The ER diagram is consistent with the existing data model in
docs/architecture.md(lines 80-127). Thevisibilitycolumn design (string, not Postgres enum) has a clear rationale. Thevisible_toscope matches the implementation sketch indocs/visibility.md(lines 95-101) exactly. Feature flag resolution logic (per-user > role > global) is consistent withdocs/feature-flags.md(lines 40-46).Keycloak Provisioning -- Terraform config snippets follow the existing
k3s.tfvarspattern. Key security decisions are correct: confidential client (not public), PKCE S256 mandatory, direct access grants disabled, registration disabled. The provisioning order (8 steps) is complete and correctly sequences the cross-repo dependencies.Secrets Wiring -- All four env vars are documented. The
secretKeyRefYAML matches existing patterns inpal-e-deployments. No actual secrets are committed -- only variable names and structure.Follow-Up Tickets -- Seven concrete implementation tickets are listed with dependency ordering and specific file targets. Each ticket's scope is narrow enough to implement independently.
Cross-reference accuracy:
docs/architecture.mdexactly (same table structures, same column names, same relationships).visible_toscope matchesdocs/visibility.mdverbatim.docs/feature-flags.md.docs/infrastructure.md(lines 156-165), which already listsKEYCLOAK_CLIENT_IDandKEYCLOAK_CLIENT_SECRETas planned.Minor design observations (non-blocking):
extract_rolemethod in theSessionsControllersketch defaults unknown authenticated users to"member". This is a reasonable safe default for this app's two-role model, but the design doc should note this is a deliberate choice (it is implied but not explicitly called out as a design decision).!= nilcheck on lineuser.preferences.dig("flags", flag_name) != nilin the flag resolution logic is correct but unconventional Ruby --!user.preferences.dig("flags", flag_name).nil?is more idiomatic. This is a design sketch, not shipped code, so it is a minor observation.user_idandsession_idto be null at the DB level, with model-level validation as a trade-off. The doc acknowledges this explicitly and explains why. Acceptable for a spike.BLOCKERS
None. This is a docs-only spike with no runtime code changes. No secrets committed, no unvalidated input, no test coverage gap (nothing to test).
NITS
The README diff inserts the
[Auth]link between[Feature Flags]and[User Stories]. This breaks the existing ordering pattern -- the docs list is currently ordered by topic flow (Architecture > Infrastructure > Pipeline > Visibility > Feature Flags > User Stories > Rails Notes). Auth logically belongs after Visibility and before or alongside Feature Flags. The current placement is acceptable but could be reordered toArchitecture > Infrastructure > Pipeline > Visibility > Auth > Feature Flags > ...to follow the logical flow from access tiers to auth to feature gating. Non-blocking.The initializer sketch references
OmniAuth::Strategies::KeycloakOpenIdwith an explicitstrategy_classparameter. This may not be needed if the:keycloak_openidsymbol resolves correctly. Worth verifying during implementation. Non-blocking since this is a design sketch.SOP COMPLIANCE
16-keycloak-integration-spikefollows{issue-number}-{kebab-case-purpose}conventionldraney/palinks #16andsop-keycloak-client-creationCloses #16present in PR bodyPROCESS OBSERVATIONS
VERDICT: APPROVED