Terraform state splitting — modularize monolith main.tf for isolated applies #197
Labels
No labels
domain:backend
domain:devops
domain:frontend
status:approved
status:in-progress
status:needs-fix
status:qa
type:bug
type:devops
type:feature
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
forgejo_admin/pal-e-platform#197
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Type
Feature
Lineage
Discovered during incident #184 session. #196 documents the symptom (MinIO blocking Helm applies). This ticket is the permanent fix.
Repo
forgejo_admin/pal-e-platformUser Story
Story 1: Isolated deploys
As a platform operator
I want Terraform changes scoped to one service to only refresh that service's state
So that a MinIO hiccup doesn't block a Woodpecker Helm change (or vice versa)
Story 2: Blast radius containment
As a platform operator
I want each infrastructure module to have its own apply path
So that a bad apply in one module doesn't corrupt or block unrelated modules
Context
terraform/main.tfis a ~2200-line monolith with one state file. Everytofu applyrefreshes ALL resources across 4 providers (kubernetes, helm, tailscale, minio). When any provider has connectivity issues, the entire apply fails — even if the change only touches one Helm release.This caused 3 consecutive apply failures for PR #195 (a 1-line Woodpecker env var change) because the MinIO provider couldn't reach
minio.minio.svc.cluster.localduring the refresh phase.Architecture
Current (monolith):
Target (modular, Option B first → Option A later):
Migration path (each step independently valuable):
module {}blocks, same state. CI uses-target=module.Xfor scoped applies.tofu state mv. Each module gets its own backend. Full blast radius isolation..woodpecker.yamlwith path-based module detection and targeted apply. Deferred from this ticket due to non-trivial CI pipeline complexity (328 lines, kubeconfig, 15+ secrets, lock retry).File Targets
Files to create:
terraform/modules/ci/main.tf— Woodpecker Helm release, agent env, gRPC funnel, namespaceterraform/modules/ci/variables.tf+outputs.tfterraform/modules/storage/main.tf— MinIO buckets, IAM users, policiesterraform/modules/storage/variables.tf+outputs.tfterraform/modules/monitoring/main.tf— kube-prometheus-stack Helm, alert rules, PodMonitorsterraform/modules/monitoring/variables.tf+outputs.tfterraform/modules/forgejo/main.tf— Forgejo Helm, OAuthterraform/modules/forgejo/variables.tf+outputs.tfterraform/modules/keycloak/main.tf— Keycloak Helmterraform/modules/keycloak/variables.tf+outputs.tfterraform/modules/harbor/main.tf— Harbor Helm, projectsterraform/modules/harbor/variables.tf+outputs.tfterraform/modules/networking/main.tf— Tailscale funnels, subnet routerterraform/modules/networking/variables.tf+outputs.tfterraform/modules/database/main.tf— CNPG clusters, backups, secretsterraform/modules/database/variables.tf+outputs.tfterraform/modules/ops/main.tf— DORA exporter, Ollama, embedding, backup jobs, miscterraform/modules/ops/variables.tf+outputs.tfFiles to modify:
terraform/main.tf— replace inline resources with module calls +moved {}blocksterraform/variables.tf— pass vars through to modulesterraform/outputs.tf— all 11 outputs re-pointed to module resource addressesterraform/providers.tf— each module declares ownrequired_providersterraform/versions.tf— version constraints shared or per-moduleterraform/network-policies.tf— 9 policies referencing namespace resources that move to modulesFiles NOT to touch:
salt/— Salt is a separate IaC pillarterraform/k3s.tfvars/secrets.auto.tfvars— vars unchanged.woodpecker.yaml— CI targeting deferred to sub-ticket #198Acceptance Criteria
moved {}blocks enumerated and verified —tofu planshows 0 add/0 destroytofu plan -target=module.cionly refreshes CI resources (no MinIO, no Tailscale)tofu plan -target=module.storageonly refreshes MinIO resourcestofu apply -target=module.cisucceeds even when MinIO is unreachabletofu apply(no target) still works for complete reconciliationtofu validatepasses after restructuretofu planshows 0 changes after migration (state preserved, no resources destroyed/recreated)Test Expectations
tofu validatepassestofu plan -lock=falseshows 0 changes after migration (refactor, not rebuild)tofu plan -target=module.ci -lock=falsecompletes without MinIO errorstofu plan -lock=false -target=module.ciinterraform/Constraints
moved {}blocks to tell Tofu about resource address changes (81 blocks required)moved {}manifest before any resource relocationtofu planmust include-lock=falserequired_providersblockChecklist
Related
project-pal-e-platformScope Review: NEEDS_REFINEMENT
Review note:
review-436-2026-03-26Well-structured issue with complete template, full traceability, and sound architecture -- but 4 files and ~25 resources are missing from scope.
Refinement Update
Per review (
review-436-2026-03-26):Fix 1: Missing file targets
Added to scope:
terraform/network-policies.tf(228 lines, 9 policies) — namespace references move with their modulesterraform/outputs.tf(11 outputs) — resource addresses change to module outputsterraform/providers.tf— each module declaresrequired_providersterraform/versions.tf— version constraints shared or per-moduleFix 2: Orphaned resources → 9th "ops" module
~25 resources don't fit the 8 service modules. Adding
modules/ops/:Fix 3: CI pipeline change → sub-ticket
The
.woodpecker.yamltargeted apply logic is non-trivial (328 lines, kubeconfig setup, 15+ secrets, lock retry). Split into a follow-up ticket rather than bundling with the module restructure. This ticket delivers the modules; CI targeting comes after.Fix 4: 81 moved blocks
Added acceptance criterion: "All
movedblocks enumerated and verified —tofu planshows 0 add/destroy after migration." Agent executing this ticket must generate the fullmoved {}manifest before any resource relocation.Fix 5: Missing acceptance criteria
Added:
tofu planshows 0 add, 0 destroy after restructureFix 6: Provider count
Corrected: 4 providers (kubernetes, helm, tailscale, minio), not 5.
Scope Re-Review: NEEDS_REFINEMENT
Review note:
review-436-2026-03-26(updated)The refinement comment correctly addresses all 6 original concerns -- technically sound across the board. However, the issue body (the spec an executing agent reads) was not edited. All fixes live only in the comment.
Two actions remain before READY:
network-policies.tf,outputs.tf,providers.tf,versions.tfto File Targetsmodules/ops/to architecture diagram and File Targets (9th module for ~25 orphaned resources).woodpecker.yamlfrom File Targets (CI split to sub-ticket)VERDICT: NEEDS_REFINEMENT -- refinement content is correct, just needs to be applied to the issue body and the sub-ticket created.
Scope Review: READY
Review note:
review-436-2026-03-26All 6 refinement items from the previous NEEDS_REFINEMENT review have been incorporated into the issue body. Sub-ticket #198 created and tracked on board (#437).
Refinement resolution:
VERDICT: READY -- this ticket can move from todo to next_up.
forgejo_admin referenced this issue2026-03-27 06:04:12 +00:00