feat: add lock-aware retry to CI apply step #98
Labels
No labels
domain:backend
domain:devops
domain:frontend
status:approved
status:in-progress
status:needs-fix
status:qa
type:bug
type:devops
type:feature
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
forgejo_admin/pal-e-platform#98
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Lineage
phase-platform-17b-tf-state-governance-> Phase 17b.1 (CI Lock Recovery)Repo
forgejo_admin/pal-e-platformUser Story
As a platform operator
I want the CI apply step to automatically detect and recover from stale state locks
So that a crashed apply does not block all deployments on main
Context
Every merge to main triggers
tofu apply. If a previous apply crashed and left a state lock, the pipeline fails and blocks ALL deployments. This happened with pipeline #80. Currently the only recovery is manual intervention: SSH into the cluster, extract the lock ID from the error output, and runtofu force-unlock. This is a DORA MTTR problem -- stale locks should be auto-recovered.File Targets
Files the agent should modify:
.woodpecker.yaml-- replace the singletofu applycommand in the apply step with a lock-aware retry scriptFiles the agent should NOT touch:
terraform/-- no .tf file changes neededsalt/-- no Salt changes neededAcceptance Criteria
tofu applyfails with "the state is already locked", the script extracts the lock ID, runstofu force-unlock -force, and retries apply oncetofu applyfails with any other error, the step fails normally with the original exit codetofu applysucceeds on first attempt, the step succeeds normallyTest Expectations
.woodpecker.yamlparses as valid YAML after edittofu fmt -check -recursiveshould still passConstraints
ghcr.io/opentofu/opentofu:1.9(Alpine-based, sh not bash)-P(Perl regex) -- usesedfor extraction$?after a pipe returns exit code of last command (tee), not tofu -- use subshell + temp file approachChecklist
Related
pal-e-platform-- project this affectsphase-platform-17b-tf-state-governance-- parent phase