Scale Woodpecker agent to 2 replicas and enable auto-cancel #405
No reviewers
Labels
No labels
domain:backend
domain:devops
domain:frontend
status:approved
status:in-progress
status:needs-fix
status:qa
type:bug
type:devops
type:feature
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
ldraney/pal-e-platform!405
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "404-scale-woodpecker-agent-replicas-to-2-and"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Summary
Stale branch/PR pipelines were hogging the single Woodpecker agent, blocking prod deploys. This scales the agent deployment to 2 replicas for parallel pipeline execution and enables server-level auto-cancel so new pushes to the same branch automatically kill running pipelines.
Changes
terraform/modules/ci/main.tf-- Bumped agentreplicaCountfrom hardcoded1tovar.woodpecker_agent_replica_count(defaults to 2). AddedWOODPECKER_PIPELINE_CANCEL_PREVIOUS=trueandWOODPECKER_PIPELINE_CANCEL_PREVIOUS_WAIT=3sto server env vars.terraform/modules/ci/variables.tf-- Addedwoodpecker_agent_replica_countvariable with default of 2 and description. No change needed in root module since the default handles it.tofu plan Output
Cannot run
tofu planfrom dev machine -- requires Salt pillar secrets andtofu initwith backend config on the server. Expected plan output:helm_release.woodpeckerwill be updated in-place:1->2WOODPECKER_PIPELINE_CANCEL_PREVIOUS=trueWOODPECKER_PIPELINE_CANCEL_PREVIOUS_WAIT=3sRun
make tofu-planon the server to confirm before applying.Test Plan
make tofu-planon the server and verify only the expected Helm release changes appearmake tofu-applyand confirm both agent pods come up healthyReview Checklist
tofu fmt -recursivepassestofu planreviewed on server before applyRelated Notes
WOODPECKER_PIPELINE_CANCEL_PREVIOUS_WAIT=3sgives a brief grace period before cancelling, avoiding race conditions on rapid successive pushes.Related
Closes #404
Cross-repo parent: ldraney/landscaping-assistant#62
QA Review
Scope: 2 files, +16/-8 lines. Terraform CI module changes only.
Findings
No issues found. The diff is minimal and correct:
WOODPECKER_PIPELINE_CANCEL_PREVIOUS = "true"-- correct server-level env var name per Woodpecker docs. Enables auto-cancel globally for all repos on the instance.WOODPECKER_PIPELINE_CANCEL_PREVIOUS_WAIT = "3s"-- sensible grace period to avoid cancelling pipelines during rapid successive pushes (e.g., force-push + immediate follow-up).replicaCountparameterized viavar.woodpecker_agent_replica_countwithdefault = 2-- clean approach. Root module needs no change since it uses the default.type = number-- correct for replicaCount.TF_SECRET_VARSor Salt pillar.Notes
tofu planmust be run on the server before apply -- cannot validate from dev machine.VERDICT: APPROVE
PR #405 Review
DOMAIN REVIEW
Tech stack: Terraform (HCL), Helm values, Kubernetes (Woodpecker CI on k3s).
Variable declaration correctness: The
woodpecker_agent_replica_countvariable invariables.tfis well-formed -- correct type (number), sensible default (2), descriptive comment. The reference inmain.tfasvar.woodpecker_agent_replica_countis syntactically correct.Root module wiring: Since the variable has a default, the root module does not need to pass it explicitly. This is valid. However, if the root module already passes other
cimodule variables explicitly (which it likely does for secrets), the implicit default here creates an inconsistency in style -- not a blocker, just noted.Helm values structure: The
replicaCountis placed under theagentblock and the env vars underserver.env, which matches Woodpecker Helm chart conventions.Auto-cancel scope (WARNING):
WOODPECKER_PIPELINE_CANCEL_PREVIOUS=trueis a server-level setting that applies to ALL repositories on this Woodpecker instance. The original landscaping-assistant ticket (#62) likely intended per-repo.woodpecker.ymlconfiguration (cancel_previous: true). The tradeoff:For a single-developer instance with all repos using short CI pipelines, server-level is acceptable. The PR body acknowledges this tradeoff explicitly. If any repo ever needs a pipeline to run to completion regardless of new pushes, this will need to be revisited.
WOODPECKER_PIPELINE_CANCEL_PREVIOUS_WAIT=3s: This is reasonable. The purpose is to avoid cancelling a pipeline that was triggered nearly simultaneously (race condition on rapid pushes). 3 seconds is short enough to not defeat the purpose of auto-cancel, and long enough to handle git push timing races. Woodpecker docs recommend values in the 1-5s range for this.
Resource impact: 2 agent replicas at 50m CPU + 64Mi memory each = 100m CPU + 128Mi total. On a single k3s node this is negligible. No concern here.
BLOCKERS
None. This is a clean, minimal infrastructure change with no security implications.
NITS
Alignment padding: The env var block was re-aligned to accommodate the longer
WOODPECKER_PIPELINE_CANCEL_PREVIOUS_WAITkey name. This is fine for readability but will cause a noisy diff (all 7 existing lines changed). Not a problem, just noting it inflates the diff stats.Variable placement: The new variable is inserted between
woodpecker_encryption_keyandcnpg_iam_user_name. Grouping by service (all woodpecker vars together) is good -- this placement is appropriate.Consider documenting the global scope: A brief inline comment in
main.tflike# Server-level: applies to ALL repos on this instancenext to the cancel vars would help future-you remember the blast radius. Optional.SOP COMPLIANCE
404-scale-woodpecker-agent-replicas-to-2-and(matches issue #404)tofu fmtcompliance noted in review checklistPROCESS OBSERVATIONS
woodpecker_agent_replica_count = 1in root module and re-apply. Auto-cancel env vars can be removed similarly.tofu planstep being server-only is a known limitation, not a gap.VERDICT: APPROVED