Add feature flags architecture doc #131
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "129-feature-flag-strategy"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Summary
Architecture doc for the feature flag strategy decided in spike #129. Documents a database-backed
FeatureFlagmodel with cached lookups via SolidCache, controller/view helpers, test helpers, seed conventions with cleanup dates, and an upgrade path to Flipper. Also created implementation ticket #130.Changes
docs/feature-flags.md-- new architecture doc covering:FeatureFlagmodel withRails.cache.fetch(1-min TTL),pick(:enabled)for minimal overheadfeature_enabled?(name)exposed to views viahelper_method&&with_feature_disabledblock helper for exception-path specsstripe(#125),client_requests(#123),keycloak_login(#115)Test Plan
Review Checklist
Related Notes
landscaping-assistant-- project slugSelf-Review (Dev agent -- no QA agent available to spawn)
Reviewed the single-file diff:
docs/feature-flags.md(356 lines added, 0 deleted).Findings
No blocking issues.
Minor observations (informational, not blocking):
View usage code block mixes ERB and Ruby (line 115-136 in doc). The ERB examples and the
PaymentsControllerclass are in the same fenced code block taggederb. The controller example is plain Ruby, not ERB. This is a documentation style nit -- the intent is clear, but splitting into two blocks (oneerb, oneruby) would be more precise. Not worth a fix cycle.Phase 4 unique index (line 170). The future migration adds
add_index :feature_flags, [:name, :business_id], unique: truebut the existing unique index onnamealone would need to be dropped first, since a flag name could then appear multiple times (once per business + once global). The doc says "this is not built now" so this is just a note for the implementer to catch when Phase 4 arrives.Test helper
before(:each)(line 284). CallsFeatureFlag.find_eachwhich assumes flags exist in the test DB. If no flags are seeded (e.g., a spec file that doesn't load seeds), this is a no-op, which is fine. The implementation ticket (#130) should ensure test fixtures or factories create the flags.All three are informational and do not warrant changes to a docs-only PR.
Checklist
Rails.cache.fetchwith SolidCachepick(:enabled)rationale is sound for hot-path optimizationensureblock handles cleanup on failureVERDICT: APPROVED
PR #131 Review
DOMAIN REVIEW
Tech stack: Ruby on Rails 8.1, Hotwire (Turbo + Stimulus), Propshaft, plain CSS, SolidCache, CNPG Postgres on k3s via ArgoCD. This is a docs-only PR (no code), so the review focuses on technical accuracy of the proposed architecture and consistency with existing documentation.
Rails/ActiveRecord code snippets:
Migration (lines 33-46): Correct Rails 8.1 migration syntax.
ActiveRecord::Migration[8.1]is the right version lock. Schema is clean --null: false,default: false, unique index onname. No issues.Model (lines 50-78):
pick(:enabled)is valid ActiveRecord (added in Rails 6).Rails.cache.fetchwithexpires_in: CACHE_TTLcorrectly delegates to SolidCache perdocs/multi-database.md. Thefind_or_create_by!+update!pattern inenable/disableis correct. Cache bust after write is the right order of operations.ApplicationController helper (lines 83-95):
helper_method :feature_enabled?is the standard Rails pattern for sharing a controller method with views. Correct.View usage (lines 100-121): ERB gating examples are clean. The controller-level
before_actionguard returninghead :not_foundfollows the existing pattern in the codebase (sessions controller returns 404 for unauthorized routes, perdocs/app-architecture.md).Seeds (lines 126-141):
find_or_create_by!with a block is idempotent -- the block only executes on create, not find. This means existing flag values are preserved across deploys. Correct.Testing helper (lines 254-286):
ensureblock guarantees cleanup even on test failure.Rails.cache.clearinbefore(:each)prevents cross-test cache pollution. Thewith_feature_enabledhelper re-enables in the ensure block, which is correct for the "default ON" philosophy.Turbo Streams emergency broadcast (line 335):
Turbo::StreamsChannel.broadcast_action_to("application", action: "refresh")-- theaction: "refresh"is a Turbo 8 feature (morphing). This is correct for Rails 8.1 which bundles Turbo 8. Note: this requires clients to be subscribed to the"application"stream, which would needturbo_stream_from "application"in the layout. The doc does not mention this prerequisite.Flipper upgrade (lines 497-513): The migration path is mechanical and accurate.
Flipper.enabled?(:name)mirrorsFeatureFlag.enabled?(:name). Clean find-and-replace.SolidCache interaction: The doc correctly identifies that
Rails.cachein production is SolidCache (confirmed bydocs/multi-database.mdshowingconfig.cache_store = :solid_cache_store). The 1-minute TTL means at most one DB query per minute per flag per pod. SolidCache stores cache entries in thelandscaping_assistant_cachedatabase, so flag lookups after the first hit do not touch the primary database at all. This is well-reasoned.Deployment pipeline: The doc correctly describes the three-repo model (cross-referenced against
docs/pipeline.mdanddocs/infrastructure-and-pipeline.md). The claim thatdb:prepareruns migrations AND seeds is correct --db:preparerunsdb:create(if needed) +db:migrate+db:seed. The CI claim that Woodpecker runsdb:create db:migratebut NOTdb:seedis verified against.woodpecker.yamlline 47.BLOCKERS
None.
NITS
Super admin role is not documented in
user-stories-auth.mdorapp-architecture.md(lines 174-239 of the diff). The feature-flags doc introduces a fifth role (super_admin) with a new Keycloak realm role, a new nav view ("Platform" under Person icon), and a new route namespace (/platform/*). The existing docs (docs/user-stories-auth.md,docs/app-architecture.md) describe exactly four roles: Client, Crew Member, Crew Lead, Admin. The navigation table in both docs shows five slots with no mention of a Platform view.This is not a blocker because the super admin concept is reasonable and non-conflicting -- it is a platform-level role above the business hierarchy. However, implementation ticket #130 or a separate doc update ticket should update
user-stories-auth.mdandapp-architecture.mdto:super_adminto the role hierarchy diagramsuper_adminto the Keycloak realm role listWithout this update, the docs will contradict each other once the feature-flags doc merges.
db:preparedoes NOT always run seeds (lines 422-430). The doc states: "db:prepareruns pending migrations AND seeds." This is subtly incorrect. In Rails,db:preparerunsdb:create(if the database does not exist) anddb:migrate. It only runsdb:seedif the database was just created (i.e., on first run). On subsequent deploys where the database already exists,db:prepareruns onlydb:migrate-- seeds are skipped. This means new flags added todb/seeds.rbwill NOT be automatically seeded on the next deploy if the database already exists.The implementation should either:
db:migrate db:seedin the initContainer instead ofdb:prepare, orrails consoleafter deployThis is a doc accuracy issue, not a code blocker, but it will cause confusion during implementation if not corrected.
Turbo Stream refresh prerequisite (line 335). The emergency broadcast
Turbo::StreamsChannel.broadcast_action_to("application", action: "refresh")requires that the application layout includes<%= turbo_stream_from "application" %>so clients are subscribed to that stream. The doc does not mention this prerequisite. Worth a one-line note.Flag lifecycle step 3 says "toggle on via console" (line 356), but earlier the doc describes a full Super Admin UI for toggling flags (lines 174-239). The lifecycle should mention the UI as the primary toggle method, with console as fallback. Minor inconsistency.
enabled?method on the model instance (line 215 of the diff, in the controllerupdateaction):flag.enabled?is called as a predicate. ActiveRecord generates boolean predicates (enabled?) for boolean columns, so this works. However, the value read here is afterupdate!, which is correct. Just noting this is fine.Stale flag detection query (lines 368-370):
FeatureFlag.where(enabled: true).where("updated_at < ?", 30.days.ago)-- this checksupdated_at, which resets on any update, not just when the flag was enabled. If someone updates the description of an enabled flag,updated_atresets and the flag no longer appears stale. Consider adding anenabled_attimestamp column, or document this limitation.Per-business upgrade path (lines 149-169): The future unique index
[:name, :business_id]with nullablebusiness_idwill allow multiple global flags with the same name ifbusiness_idis NULL, because SQLUNIQUEconstraints treat NULLs as distinct. This needs a partial unique index:add_index :feature_flags, :name, unique: true, where: "business_id IS NULL"plus the composite index for non-null business_id. Worth a note in the "not built now" section so the future implementation does not hit this.SOP COMPLIANCE
129-feature-flag-strategymatches issue #129landscaping-assistantreferenced, closes #129, references #130, #125, #123, #115, #107docs/feature-flags.md), no scope creepPR body note: The PR body uses
## Review Checklistand## Related Notesinstead of## Related. This is a minor deviation from the SOP template but the content is complete and functional.TEMPLATE CONFORMANCE (Issue #129)
I was unable to fetch the issue #129 body directly (no
get_issuetool available --list_issuesreturns only titles). Based on the issue title ("Spike: Feature flag strategy (gate features, rollback, gradual rollout)"), I can validate the spike deliverables against the template:Template requires:
docs/{topic}.mdcreated -- YES:docs/feature-flags.md(513 lines)The spike deliverables match the
template-issue-spikerequirements. Full template conformance of the issue body itself cannot be verified without the issue body text.PROCESS OBSERVATIONS
db:preparevsdb:seedmisunderstanding (nit #2 above), which could cause flag seeding to silently fail on existing databases.VERDICT: APPROVED
The architecture doc is technically sound, well-reasoned, and consistent with the existing codebase conventions. The nits above are documentation improvements -- none are blockers. The
db:prepareseeding nuance (nit #2) and super admin cross-doc gap (nit #1) are the most important items to address, ideally before or during implementation in #130.f344b827549c2136b5c1PR #131 Review
DOMAIN REVIEW
Tech stack: Ruby on Rails 8.1, Hotwire (Turbo Streams/Frames), SolidCache, k3s/CNPG deployment. This is a docs-only PR -- no executable code to lint or test, but the code samples in the doc will be copy-pasted into implementation ticket #130, so correctness matters.
Architecture decisions -- all sound:
DB-backed flags vs ENV vars vs Flipper: Correct reasoning. ENV vars require a pod restart (deploy). Flipper is overkill for global on/off with <10 flags. DB row toggle is instant. Good.
Rails.cache.fetchwith 1-min TTL via SolidCache: Appropriate. SolidCache is already configured perdocs/multi-database.md. TheCACHE_TTL = 1.minuteconstant is clear and easy to tune. The explicit note about whyCurrentAttributesis wrong (resets per-request, no caching benefit) is a nice addition.pick(:enabled)vsfind_by: Correct.pickavoids model instantiation -- returns the raw boolean from aSELECT enabled ... LIMIT 1. Minimal overhead for a hot path.Rake task
feature_flags:syncinstead ofdb/seeds.rb: This was a nit from the previous review and is now correctly addressed. The reasoning is accurate --db:prepareonly runs seeds on first DB creation, so subsequent deploys would never register new flags. Thefind_or_create_by!pattern is idempotent and preserves existingenabledstate.Two partial unique indexes for NULL uniqueness: Correctly addresses the SQL NULL caveat.
WHERE business_id IS NOT NULLandWHERE business_id IS NULLas separate partial indexes is the standard PostgreSQL pattern. This was another previous nit, now resolved.404 (not 403) for non-super-admin: Correct security posture. Returning 403 leaks the existence of the route. 404 makes the endpoint invisible.
Turbo Stream emergency broadcast: The prerequisite dependency on
turbo_stream_from "application"in the layout is clearly documented, with the Solid Cable ticket (#2) referenced. The "V1 is acceptable without instant propagation" rationale is sound -- flag toggles are rare operational events.Super admin under Person icon (Platform view), not a new nav tab: Consistent with the 5-slot nav layout documented in
docs/user-stories-auth.md. No wasted nav real estate.initContainerupdated tobin/rails db:prepare && bin/rails feature_flags:sync: One-line change in pal-e-deployments. Correct that this is a cross-repo change that needs to happen alongside #130.Code samples validated:
null: falseon name and enabled,default: falseon enabled -- correct.validates :name, presence: true, uniqueness: true) matches the DB constraint.enable/disableclass methods bust the cache immediately viaRails.cache.delete-- correct.before_actionfor auth,head :not_found-- idiomatic Rails.button_towithmethod: :patch-- standard Turbo form, no JS needed.ensurefor cleanup, which is correct even if the test raises.before(:each)hook enables all flags and clears cache -- prevents test pollution.Phase 4 future-proofing: The per-business scoping upgrade path is well thought out. The two-partial-index approach for the nullable FK is the correct PostgreSQL pattern (standard composite unique index treats each NULL as distinct, so
[:name, :business_id]would allow duplicate global flags). The fallback lookup (business-specific, then global) is clean.BLOCKERS
None.
This is a docs-only PR. No executable code, no security surface, no test coverage requirement. All three nits from the previous review have been addressed:
NITS
README roles table is stale: The README at line 15-20 still shows 4 roles (Client, Crew Member, Crew Lead, Admin) without Super Admin. This PR updates
docs/user-stories-auth.mdto add Super Admin but does not touch the README. The README is the front door -- new contributors will see "four roles" and then read "five roles" in the auth doc. Consider adding a row for Super Admin in a follow-up or as part of #130. Not blocking because the README roles table is a summary and the super admin role is platform-internal (not a business role), but the discrepancy will confuse readers.with_feature_enabledhelper has a no-op ensure: Lines in the test helper showwith_feature_enabledcallingFeatureFlag.enable(name)in theensureblock, but the method also starts withFeatureFlag.enable(name). Since thebefore(:each)hook already enables all flags, the entirewith_feature_enabledmethod is a no-op unless a prior test disabled a flag and thebefore(:each)hook did not run (which would be a bug). The method exists for symmetry and readability, which is fine, but worth a comment in the implementation noting it is intentionally redundant.cleanup_afterconvention is a comment, not a column: The doc mentions "Thecleanup_aftercomment in seeds tracks the deadline" but the convention is only a code comment in the rake task, not an actualcleanup_aftercolumn or a structured data field. The stale-flag console query usesupdated_atas a proxy instead. This is fine for V1, but the doc could be slightly clearer that the cleanup date is a human convention (comment), not machine-enforced.SOP COMPLIANCE
129-feature-flag-strategymatches spike #129landscaping-assistantreferenced, closes #129, links #130, #125, #123, #115, #107docs/feature-flags.mdis the spike deliverable,docs/user-stories-auth.mdchanges are the necessary role updates to integrate super admindocs/feature-flags.mdcreated (docs/ file), #130 created (follow-up ticket)PROCESS OBSERVATIONS
bin/rails db:prepare && bin/rails feature_flags:sync) is a cross-repo dependency in pal-e-deployments. Ticket #130 should explicitly list this as a required change or it will be forgotten at merge time.VERDICT: APPROVED