Bug: tofu apply fails on MinIO provider refresh during unrelated Helm changes #196

Open
opened 2026-03-27 03:55:37 +00:00 by forgejo_admin · 2 comments

Type

Bug

Lineage

Discovered during PRs #192 and #195 — both Woodpecker Helm-only changes that failed apply due to MinIO DNS timeout in the refresh phase.

Repo

forgejo_admin/pal-e-platform

What Broke

tofu apply in CI consistently fails when the MinIO provider can't reach minio.minio.svc.cluster.local:9000 during the state refresh phase. This blocks ALL Terraform changes — even those that only touch Helm releases and have nothing to do with MinIO.

Error pattern:

Error: [FATAL] error reading IAM User (cnpg): Get "http://minio.minio.svc.cluster.local:9000/...": dial tcp: i/o timeout
Error: [FATAL] error checking bucket existence (assets): ... i/o timeout

This has caused PRs #194 and #195 to require multiple retry attempts. PR #192's apply succeeded on the first try, suggesting intermittent MinIO connectivity from pipeline pods.

Repro Steps

  1. Merge any PR to pal-e-platform main
  2. CI triggers tofu apply in a k8s pipeline pod
  3. Tofu refreshes all resources including MinIO buckets, IAM users, and policies
  4. If MinIO is slow or temporarily unreachable from the pipeline pod, the entire apply fails
  5. Unrelated Helm release changes never deploy

Expected Behavior

Changes to Helm releases should deploy reliably without being blocked by MinIO provider connectivity. Either:

  • MinIO resources should be in a separate state/module so Helm changes don't trigger MinIO refresh
  • MinIO provider should have retry/timeout configuration
  • -target should work for isolated changes (currently it still refreshes MinIO state)

Environment

  • Cluster/namespace: woodpecker (CI pipeline pods) → minio (target)
  • MinIO service: minio.minio.svc.cluster.local:9000
  • Tofu version: 1.9
  • MinIO provider: aminueza/minio v3.28.1

Acceptance Criteria

  • Helm-only changes can apply without MinIO provider blocking them
  • Apply reliability improves from ~50% to >95% success rate
  • No regression on MinIO resource management
  • #194 — blocked by this (MAX_WORKFLOWS apply keeps failing)
  • #195 — blocked by this (same apply failure)
  • #192 — succeeded on first try (intermittent nature)
  • project-pal-e-platform
### Type Bug ### Lineage Discovered during PRs #192 and #195 — both Woodpecker Helm-only changes that failed apply due to MinIO DNS timeout in the refresh phase. ### Repo `forgejo_admin/pal-e-platform` ### What Broke `tofu apply` in CI consistently fails when the MinIO provider can't reach `minio.minio.svc.cluster.local:9000` during the state refresh phase. This blocks ALL Terraform changes — even those that only touch Helm releases and have nothing to do with MinIO. Error pattern: ``` Error: [FATAL] error reading IAM User (cnpg): Get "http://minio.minio.svc.cluster.local:9000/...": dial tcp: i/o timeout Error: [FATAL] error checking bucket existence (assets): ... i/o timeout ``` This has caused PRs #194 and #195 to require multiple retry attempts. PR #192's apply succeeded on the first try, suggesting intermittent MinIO connectivity from pipeline pods. ### Repro Steps 1. Merge any PR to pal-e-platform main 2. CI triggers `tofu apply` in a k8s pipeline pod 3. Tofu refreshes all resources including MinIO buckets, IAM users, and policies 4. If MinIO is slow or temporarily unreachable from the pipeline pod, the entire apply fails 5. Unrelated Helm release changes never deploy ### Expected Behavior Changes to Helm releases should deploy reliably without being blocked by MinIO provider connectivity. Either: - MinIO resources should be in a separate state/module so Helm changes don't trigger MinIO refresh - MinIO provider should have retry/timeout configuration - `-target` should work for isolated changes (currently it still refreshes MinIO state) ### Environment - Cluster/namespace: woodpecker (CI pipeline pods) → minio (target) - MinIO service: `minio.minio.svc.cluster.local:9000` - Tofu version: 1.9 - MinIO provider: aminueza/minio v3.28.1 ### Acceptance Criteria - [ ] Helm-only changes can apply without MinIO provider blocking them - [ ] Apply reliability improves from ~50% to >95% success rate - [ ] No regression on MinIO resource management ### Related - #194 — blocked by this (MAX_WORKFLOWS apply keeps failing) - #195 — blocked by this (same apply failure) - #192 — succeeded on first try (intermittent nature) - `project-pal-e-platform`
Author
Owner

Scope Review: NEEDS_REFINEMENT

Review note: review-435-2026-03-27

This ticket is a symptom, not an actionable work unit. The structural fix (#197 modularization) already shipped. The CI-level fix (#198 targeted apply) is already scoped as a separate ticket in todo.

Issues found:

  • All three acceptance criteria overlap entirely with #198 -- an agent dispatched on #196 would either duplicate #198's work or have nothing concrete to do
  • #197 (state splitting) is already closed/done, which was listed as one of the "Expected Behavior" solutions
  • No independent fix is defined for this ticket beyond what #197 and #198 already cover

Recommendation: Either close as superseded by #197+#198, or repurpose with concrete scope for a short-term mitigation (provider timeout config in providers.tf or retry logic in .woodpecker.yaml apply step) with rewritten acceptance criteria targeting that specific fix.

## Scope Review: NEEDS_REFINEMENT Review note: `review-435-2026-03-27` **This ticket is a symptom, not an actionable work unit.** The structural fix (#197 modularization) already shipped. The CI-level fix (#198 targeted apply) is already scoped as a separate ticket in todo. Issues found: - All three acceptance criteria overlap entirely with #198 -- an agent dispatched on #196 would either duplicate #198's work or have nothing concrete to do - #197 (state splitting) is already closed/done, which was listed as one of the "Expected Behavior" solutions - No independent fix is defined for this ticket beyond what #197 and #198 already cover **Recommendation:** Either close as superseded by #197+#198, or repurpose with concrete scope for a short-term mitigation (provider timeout config in `providers.tf` or retry logic in `.woodpecker.yaml` apply step) with rewritten acceptance criteria targeting that specific fix.
Author
Owner

Closing — Superseded

Scope review (review-435-2026-03-27) determined this is a symptom ticket. The actual fixes are:

  • #197 (TF state splitting) — already closed/done. MinIO moved to module.storage.
  • #198 (CI targeted apply) — open in todo. Eliminates MinIO provider refresh for Helm-only changes.

Closing as superseded by #197 + #198.

## Closing — Superseded Scope review (`review-435-2026-03-27`) determined this is a symptom ticket. The actual fixes are: - #197 (TF state splitting) — already closed/done. MinIO moved to `module.storage`. - #198 (CI targeted apply) — open in todo. Eliminates MinIO provider refresh for Helm-only changes. Closing as superseded by #197 + #198.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
forgejo_admin/pal-e-platform#196
No description provided.