Remove stale basketball-api and pal-e-app namespace references blocking CI apply #449
Labels
No labels
domain:backend
domain:devops
domain:frontend
status:approved
status:in-progress
status:needs-fix
status:qa
type:bug
type:devops
type:feature
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
ldraney/pal-e-platform#449
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Type
Bug
Lineage
Discovered during DNS record deployment (godaddy-tofu integration). No parent plan — standalone fix.
Repo
ldraney/pal-e-platform
What Broke
Every push-to-main CI apply fails because Terraform references two namespaces that no longer exist on the cluster:
basketball-apiandpal-e-app. These aredatasources inmodule.database— Terraform plans using stale state data, then the apply fails when the k8s API rejects creates in non-existent namespaces. This blocks ALL infrastructure changes, including DNS record creation forpalinks.appandlandscaping-assistant.app.Repro Steps
namespaces "basketball-api" not foundandnamespaces "pal-e-app" not foundExpected Behavior
CI apply should succeed. Resources targeting non-existent namespaces should have been removed when the namespaces were deleted.
Environment
kubectl get namespacesconfirms neither namespace existspalinks,paldocs,pal-e-admin,pal-e-docs,westside-rorScope
Data sources in state (stale — namespace gone):
data.kubernetes_namespace_v1.basketball_api(module.database)data.kubernetes_namespace_v1.pal_e_production(module.database)Managed resources in state (orphaned — target namespace gone):
kubernetes_config_map_v1.basketball_api_dashboard(module.monitoring)kubernetes_config_map_v1.pal_e_production_dashboard(module.monitoring)kubernetes_manifest.embedding_alerts(module.monitoring)kubernetes_manifest.embedding_worker_service_monitor(module.monitoring)kubernetes_manifest.gmail_oauth_expiry_alert(module.monitoring)kubernetes_manifest.payment_pipeline_alerts(module.monitoring)New resources in code but NOT in state (fail on create):
kubernetes_secret_v1.paledocs_db_url(module.database) — targets pal-e-appkubernetes_job_v1.admin_app_user_provision(module.database) — targets basketball-apikubernetes_service_v1.embedding_worker_metrics(module.ops) — targets pal-e-appkubernetes_manifest.netpol_basketball_api(root) — targets basketball-apiFiles to modify:
terraform/modules/database/main.tf— remove data sources (lines 61-64, 110-114), paledocs_db_url secret, admin_app_user_provision jobterraform/modules/database/outputs.tf— remove namespace outputsterraform/modules/database/variables.tf— remove admin_app_db_password referencesterraform/modules/ops/main.tf— remove embedding worker metrics service (lines 28-55)terraform/modules/ops/variables.tf— remove pal-e-app namespace variableterraform/network-policies.tf— remove netpol_basketball_api resource (lines 192-215), remove basketball-api/pal-e-app fromfromrulesterraform/modules/monitoring/main.tf— remove basketball-api and pal-e-app dashboards, alerts, service monitorsterraform/variables.tf— remove admin_app_db_password variable if fully orphanedState operations (must run with cluster access):
Approach
basketball-apiandpal-e-appnamespacesadmin_app_db_passwordvariable and secret if fully orphanedtofu state rmfor orphaned resources before merge (state ops must happen before the apply — one-time manual operation with kubectl/kubeconfig access)tofu planshows clean diffAcceptance Criteria
tofu planruns clean with no namespace errorspalinks.appandlandscaping-assistant.appare createdRisk
tofu state rmis irreversible but safe here: the orphaned resources don't exist on-cluster, so removing from state won't destroy anything.admin_app_db_passwordis in Salt pillars and Woodpecker secrets. Removing the TF variable is safe; cleaning up Salt/Woodpecker is optional follow-up.Related
Issue #449 Template Review
TEMPLATE CONFORMANCE
### Typepresent and valid (Bug)### Lineagepresent and descriptive### Repopresent### What Brokepresent and specific (exact error messages, blocking impact)### Repro Stepspresent and reproducible### Expected Behaviorpresent### Environmentpresent with CI pipeline reference### Acceptance Criteriapresent with checkboxes### Relatedpresent with PR cross-references### Scope,### Approach,### Risk) add value beyond template minimumTemplate conformance: PASS. All required bug template sections present and non-empty.
CONTENT QUALITY
Strengths:
module.database.data.kubernetes_namespace_v1.basketball_api)tofu state rmis safe because the underlying k8s resources do not existFindings requiring attention:
1. MISSED FILE:
terraform/main.tf(root orchestrator)The issue lists 8 files under "Files to modify" but misses
terraform/main.tf, which contains:pal_e_production_namespace = module.database.pal_e_production_namespace-- passes the stale namespace output from database module into the ops module. Onceoutputs.tfremovespal_e_production_namespace, this line will cause a validation error unless also removed.movedblock:from = kubernetes_config_map_v1.pal_e_docs_dashboard/to = module.monitoring.kubernetes_config_map_v1.pal_e_production_dashboard-- references the dashboard resource being deleted. Thismovedblock must be removed ortofu planwill error referencing a nonexistent target.movedblocks forembedding_worker_service_monitorandembedding_alerts-- both point to resources listed for state rm. These must also be cleaned up.This is a scope gap. Without cleaning
terraform/main.tf, the code removal in modules will break validation before apply even runs.2. Network policy
fromrules at lines 176-177 ofnetwork-policies.tfThe issue mentions removing
basketball-api/pal-e-app from 'from' ruleswhich is correct. Verified at lines 176-177 ofterraform/network-policies.tf-- these arenamespaceSelectorentries in the postgres namespace ingress policy. Removing them is safe since the source namespaces no longer exist, but note: these are not blocking (afromrule referencing a nonexistent namespace simply never matches -- it does not cause apply failure). The issue correctly identifies them for cleanup but does not distinguish their severity from the blocking items.3. Dashboard JSON files will become orphaned
terraform/modules/monitoring/main.tfline 756 referencesdashboards/pal-e-app-golden-signals.jsonand line 920 referencesdashboards/basketball-api-golden-signals.json. When the ConfigMap resources are removed, these JSON files become dead code. The issue does not mention cleaning them up. Non-blocking but should be noted as follow-up to avoid dashboard file rot.4. Approach ordering: state-rm-first is correct but needs emphasis
The issue says step 3 is "Run
tofu state rmfor orphaned resources before merge." This is the correct ordering -- state rm MUST happen before the code-removal PR is applied. If code is removed first and CI runs apply, Terraform will attempt to destroy the orphaned resources (which don't exist on-cluster), potentially causing confusing errors. The issue gets this right but buries it in step 3. For a production state manipulation, the approach section should lead with the state operations and explicitly call out: "state rm is a prerequisite gate -- do NOT merge the code PR until state ops are confirmed."5. Overlap with issues #411 and #412
Issue #411 ("Remove deprecated pal-e-app references from Terraform") and #412 ("Remove deprecated westside-admin references from Terraform") are both open and overlap significantly with #449's scope:
paledocs_db_urlandembedding_worker_metricsresourceswestside-adminreferences includingadmin_app_db_url_westside_adminThe
### Lineagesection says "standalone" but this is actually a superset of #411 and partially overlaps #412. When #449 is completed, #411 should be closed as resolved-by and #412 should be checked for remaining scope. The issue should cross-reference these to prevent duplicate work.6. State rm command for
kubernetes_manifestresourcesThe
tofu state rmcommands forkubernetes_manifestresources (embedding_alerts, embedding_worker_service_monitor, gmail_oauth_expiry_alert, payment_pipeline_alerts) use the formatmodule.monitoring.kubernetes_manifest.X. This is syntactically correct for the Plugin Framework. However,kubernetes_manifestresources sometimes have composite state keys (e.g., including the GVK). If the state rm fails with "no matching resource," the operator should runtofu state list | grep embedding_alertsfirst to verify the exact state address. The issue should note this as a precaution.7.
outputs.tfhas additional stale references beyond namespace outputterraform/modules/database/outputs.tflines 15-26 containadmin_app_db_url_secret_nameandadmin_app_db_url_namespacesoutputs that referencekubernetes_secret_v1.admin_app_db_url-- a basketball-api-targeted resource. These outputs will break validation if the secret resource is removed but the outputs are not. The issue mentions "remove namespace outputs" but does not enumerate all affected outputs. Theadmin_app_db_url_*outputs must also be removed (or will cause a downstream error in any module consuming them).BLOCKERS
1. Incomplete file target list --
terraform/main.tfis missing from "Files to modify." Themovedblocks and module parameter wiring in this file will cause validation failure after the module-level code is removed. This must be added to scope.NITS
### Environmentsection lists Woodpecker CI pipeline #816 but does not include the cluster/namespace format from the template (Cluster: archbox k3s). Minor template conformance gap.### Scopesection is excellent but not part of the standard bug template. Consider whether this level of detail belongs in the issue body or in a linked implementation plan. For a state-manipulation ticket of this severity, having it inline is the right call.PROCESS OBSERVATIONS
tofu state listoutput for auditability.VERDICT: NOT APPROVED
Single blocker:
terraform/main.tfis missing from the file target list. Themovedblocks (lines 287-300) and module parameter wiring (line 126) in this file will causetofu validatefailure after module-level code is removed. Add this file to scope, cross-reference overlapping issues #411/#412, and note theoutputs.tfadditional stale outputs -- then this is ready to move forward.Scope Update (from QA review)
QA flagged missing items. Adding to scope:
Additional files to modify:
terraform/main.tf— remove module wiring:admin_app_db_passwordpassed to database modulepal_e_production_namespacepassed from database to opsmovedblocks for pal_e_production_dashboard, embedding_worker_service_monitor, embedding_alertsmovedblock for embedding_worker_metricsAdditional outputs to remove from
terraform/modules/database/outputs.tf:admin_app_db_url_secret_name(line 16)admin_app_db_url_namespaces(line 21)Dashboard JSON files to delete:
terraform/dashboards/basketball-api-golden-signals.jsonterraform/dashboards/pal-e-app-golden-signals.jsonState rm precaution: Run
tofu state list | grep -E 'basketball|pal_e_production|embedding|admin_app'first to confirm exact resource addresses before running state rm commands.Ordering note: State rm MUST complete before the code-removal PR is merged. Otherwise the apply will try to destroy resources it can't reach (namespace gone).