Scale Woodpecker agent replicas and auto-cancel superseded pipelines #62

Closed
opened 2026-06-02 20:49:00 +00:00 by ldraney · 2 comments
Owner

Type

Feature

Lineage

Standalone — discovered while monitoring pipeline #128/#131/#135. Stale PR/branch pipelines hog the single worker, blocking prod deploys.

Repo

ldraney/pal-e-platform (Terraform CI module)

User Story

As a developer
I want CI pipelines to run without queuing behind stale builds
So that prod deploys aren't blocked by already-merged branch pipelines

Context

Woodpecker agent is configured with replicaCount = 1 in terraform/modules/ci/main.tf. Every merge triggers 2-3 pipelines (branch push, PR event, main push) that all compete for that single agent. We had to manually cancel stale pipelines three times during PR #56 deployment. Two fixes needed:

  1. Scale agent replicas to 2 — allows parallel pipeline execution. Two is sufficient for current workload (single developer, one active repo). Revisit if more repos go active or team grows.
  2. Auto-cancel superseded pipelines — when a new push to the same branch triggers a pipeline, cancel any running pipeline for that branch.

Auto-cancel implementation note

WOODPECKER_PIPELINE_CANCEL_PREVIOUS is a server-level env var (not agent config). This affects ALL repos on the Woodpecker instance globally. Alternative: per-repo setting via Woodpecker API (cancel_previous_pipeline_events), which limits blast radius.

Recommendation: Use per-repo API setting for landscaping-assistant first. If it works well, consider enabling globally via server env var later.

File Targets

Files the agent should modify or create:

  • terraform/modules/ci/main.tf -- bump agent replicaCount from 1 to 2
  • terraform/modules/ci/variables.tf -- add variable for replica count if parameterized
  • Woodpecker API call or Terraform resource for per-repo auto-cancel setting

Acceptance Criteria

  • Agent replicaCount = 2
  • Per-repo auto-cancel enabled for landscaping-assistant
  • Merging a PR no longer queues behind stale branch/PR pipelines

Test Expectations

  • Push twice to same branch — first pipeline cancels automatically
  • Merge PR — main pipeline runs without manual cancellation of stale pipelines

Constraints

  • pal-e-platform repo, not landscaping-assistant
  • Terraform change requires tofu plan / tofu apply (Lucas approval)
  • Platform-wide change: replica scaling affects all repos sharing this Woodpecker instance
  • Cluster runs on a single k3s node — verify CPU/memory headroom before scaling

Checklist

  • Approach chosen
  • PR opened on pal-e-platform
  • Pipeline behavior verified
  • landscaping-assistant -- project affected
  • #60 -- CI bundle caching (related CI improvement)
### Type Feature ### Lineage Standalone — discovered while monitoring pipeline #128/#131/#135. Stale PR/branch pipelines hog the single worker, blocking prod deploys. ### Repo `ldraney/pal-e-platform` (Terraform CI module) ### User Story As a developer I want CI pipelines to run without queuing behind stale builds So that prod deploys aren't blocked by already-merged branch pipelines ### Context Woodpecker agent is configured with `replicaCount = 1` in `terraform/modules/ci/main.tf`. Every merge triggers 2-3 pipelines (branch push, PR event, main push) that all compete for that single agent. We had to manually cancel stale pipelines three times during PR #56 deployment. Two fixes needed: 1. **Scale agent replicas to 2** — allows parallel pipeline execution. Two is sufficient for current workload (single developer, one active repo). Revisit if more repos go active or team grows. 2. **Auto-cancel superseded pipelines** — when a new push to the same branch triggers a pipeline, cancel any running pipeline for that branch. ### Auto-cancel implementation note `WOODPECKER_PIPELINE_CANCEL_PREVIOUS` is a **server-level** env var (not agent config). This affects ALL repos on the Woodpecker instance globally. Alternative: per-repo setting via Woodpecker API (`cancel_previous_pipeline_events`), which limits blast radius. **Recommendation:** Use per-repo API setting for landscaping-assistant first. If it works well, consider enabling globally via server env var later. ### File Targets Files the agent should modify or create: - `terraform/modules/ci/main.tf` -- bump agent replicaCount from 1 to 2 - `terraform/modules/ci/variables.tf` -- add variable for replica count if parameterized - Woodpecker API call or Terraform resource for per-repo auto-cancel setting ### Acceptance Criteria - [ ] Agent replicaCount = 2 - [ ] Per-repo auto-cancel enabled for landscaping-assistant - [ ] Merging a PR no longer queues behind stale branch/PR pipelines ### Test Expectations - [ ] Push twice to same branch — first pipeline cancels automatically - [ ] Merge PR — main pipeline runs without manual cancellation of stale pipelines ### Constraints - pal-e-platform repo, not landscaping-assistant - Terraform change requires `tofu plan` / `tofu apply` (Lucas approval) - Platform-wide change: replica scaling affects all repos sharing this Woodpecker instance - Cluster runs on a single k3s node — verify CPU/memory headroom before scaling ### Checklist - [ ] Approach chosen - [ ] PR opened on pal-e-platform - [ ] Pipeline behavior verified ### Related - `landscaping-assistant` -- project affected - #60 -- CI bundle caching (related CI improvement)
Author
Owner

Ticket #62 Scope Review

SUMMARY

Ticket requests two changes to the Woodpecker CI module in ldraney/pal-e-platform: (1) scale agent replicas from 1 to 2-3, and (2) enable auto-cancellation of superseded pipelines on the same branch.

FILE TARGET VERIFICATION

  • terraform/modules/ci/main.tf -- EXISTS. Confirmed agent block with replicaCount = 1 and no cancellation config. Correct target.
  • terraform/modules/ci/variables.tf -- EXISTS. Currently has no replica-count variable. Correct target if parameterizing.

ACCEPTANCE CRITERIA REVIEW

All three criteria are testable:

  • "Agent replicaCount >= 2" -- verifiable via tofu plan output and kubectl get deployment
  • "Superseded pipelines auto-cancel when new push arrives on same branch" -- verifiable by pushing twice to a branch and observing cancellation
  • "Merging a PR no longer queues behind stale branch/PR pipelines" -- verifiable by merge + observation

ISSUES FOUND

1. Wrong environment variable name (medium)
The ticket references WOODPECKER_PIPELINE_CANCEL_PREVIOUS -- this env var does not exist in Woodpecker. The correct server-level env var is:

WOODPECKER_DEFAULT_CANCEL_PREVIOUS_PIPELINE_EVENTS

Default value: pull_request, push (which means it may already be enabled by default on the server side).

Additionally, auto-cancel can be toggled per-repo in Woodpecker project settings ("Cancel previous pipelines"). The implementing agent needs to know which mechanism to use:

  • Server env var (global default for all repos) -- set in main.tf server env block
  • Per-repo setting (repo-specific) -- set via Woodpecker API/UI, not Terraform

Recommendation: update the ticket to reference the correct env var name and clarify which approach is intended. If the default is already pull_request, push, this may already be working and the issue may be solely about replica count.

2. Cross-repo ticket (low, acceptable)
The ticket is filed on landscaping-assistant but targets pal-e-platform. The ticket correctly documents this in the "Repo" and "Constraints" sections. The PR should be opened on pal-e-platform, not landscaping-assistant. This is clear enough for an agent to follow.

3. Resource limit consideration is vague (low)
The constraints mention "consider cluster resource limits when scaling replicas" but don't specify what those limits are. The current agent resource config is 50m CPU request / 256Mi memory limit per replica. Scaling to 2-3 replicas means 100-150m CPU / 512-768Mi memory. An implementing agent may not know the cluster's total capacity. Consider adding a concrete recommendation (e.g., "2 replicas" vs "2-3").

DEPENDENCY CHECK

  • No blocking dependencies identified
  • Related ticket #60 (bundle caching) is independent -- no ordering constraint
  • No parent ticket -- standalone is correct

SOP COMPLIANCE

  • Has Summary/User Story section
  • Has File Targets section
  • Has Acceptance Criteria
  • Has Test Expectations
  • Has Constraints section
  • Has Related section
  • No secrets in ticket body
  • Related section does not reference a plan slug (standalone ticket, acceptable)

VERDICT: REQUEST_CHANGES

The ticket is well-structured and mostly ready, but the incorrect env var name (WOODPECKER_PIPELINE_CANCEL_PREVIOUS vs actual WOODPECKER_DEFAULT_CANCEL_PREVIOUS_PIPELINE_EVENTS) risks sending an implementing agent down a dead end. Before moving to todo:

  1. Required: Correct the env var name to WOODPECKER_DEFAULT_CANCEL_PREVIOUS_PIPELINE_EVENTS, or clarify that the per-repo project setting approach should be used instead.
  2. Recommended: Check whether the default value (pull_request, push) is already active -- if so, the auto-cancel may already work and the ticket scope simplifies to replica scaling only.
  3. Recommended: Pin replica count to a specific number (e.g., 2) rather than a range (2-3) to reduce ambiguity for the implementing agent.
## Ticket #62 Scope Review ### SUMMARY Ticket requests two changes to the Woodpecker CI module in `ldraney/pal-e-platform`: (1) scale agent replicas from 1 to 2-3, and (2) enable auto-cancellation of superseded pipelines on the same branch. ### FILE TARGET VERIFICATION - [x] `terraform/modules/ci/main.tf` -- **EXISTS**. Confirmed agent block with `replicaCount = 1` and no cancellation config. Correct target. - [x] `terraform/modules/ci/variables.tf` -- **EXISTS**. Currently has no replica-count variable. Correct target if parameterizing. ### ACCEPTANCE CRITERIA REVIEW All three criteria are testable: - [x] "Agent replicaCount >= 2" -- verifiable via `tofu plan` output and `kubectl get deployment` - [x] "Superseded pipelines auto-cancel when new push arrives on same branch" -- verifiable by pushing twice to a branch and observing cancellation - [x] "Merging a PR no longer queues behind stale branch/PR pipelines" -- verifiable by merge + observation ### ISSUES FOUND **1. Wrong environment variable name (medium)** The ticket references `WOODPECKER_PIPELINE_CANCEL_PREVIOUS` -- this env var does not exist in Woodpecker. The correct server-level env var is: `WOODPECKER_DEFAULT_CANCEL_PREVIOUS_PIPELINE_EVENTS` Default value: `pull_request, push` (which means it may already be enabled by default on the server side). Additionally, auto-cancel can be toggled per-repo in Woodpecker project settings ("Cancel previous pipelines"). The implementing agent needs to know which mechanism to use: - **Server env var** (global default for all repos) -- set in `main.tf` server env block - **Per-repo setting** (repo-specific) -- set via Woodpecker API/UI, not Terraform Recommendation: update the ticket to reference the correct env var name and clarify which approach is intended. If the default is already `pull_request, push`, this may already be working and the issue may be solely about replica count. **2. Cross-repo ticket (low, acceptable)** The ticket is filed on `landscaping-assistant` but targets `pal-e-platform`. The ticket correctly documents this in the "Repo" and "Constraints" sections. The PR should be opened on `pal-e-platform`, not `landscaping-assistant`. This is clear enough for an agent to follow. **3. Resource limit consideration is vague (low)** The constraints mention "consider cluster resource limits when scaling replicas" but don't specify what those limits are. The current agent resource config is 50m CPU request / 256Mi memory limit per replica. Scaling to 2-3 replicas means 100-150m CPU / 512-768Mi memory. An implementing agent may not know the cluster's total capacity. Consider adding a concrete recommendation (e.g., "2 replicas" vs "2-3"). ### DEPENDENCY CHECK - [x] No blocking dependencies identified - [x] Related ticket #60 (bundle caching) is independent -- no ordering constraint - [x] No parent ticket -- standalone is correct ### SOP COMPLIANCE - [x] Has Summary/User Story section - [x] Has File Targets section - [x] Has Acceptance Criteria - [x] Has Test Expectations - [x] Has Constraints section - [x] Has Related section - [x] No secrets in ticket body - [ ] Related section does not reference a plan slug (standalone ticket, acceptable) ### VERDICT: REQUEST_CHANGES The ticket is well-structured and mostly ready, but the incorrect env var name (`WOODPECKER_PIPELINE_CANCEL_PREVIOUS` vs actual `WOODPECKER_DEFAULT_CANCEL_PREVIOUS_PIPELINE_EVENTS`) risks sending an implementing agent down a dead end. Before moving to todo: 1. **Required**: Correct the env var name to `WOODPECKER_DEFAULT_CANCEL_PREVIOUS_PIPELINE_EVENTS`, or clarify that the per-repo project setting approach should be used instead. 2. **Recommended**: Check whether the default value (`pull_request, push`) is already active -- if so, the auto-cancel may already work and the ticket scope simplifies to replica scaling only. 3. **Recommended**: Pin replica count to a specific number (e.g., 2) rather than a range (2-3) to reduce ambiguity for the implementing agent.
Author
Owner

Ticket #62 Scope Review

Template Completeness

All required sections present: Type, Lineage, Repo, User Story, Context, File Targets, Acceptance Criteria, Test Expectations, Constraints, Checklist, Related. No gaps.

Findings Requiring Revision

1. Auto-cancel config location is wrong (BLOCKER)
WOODPECKER_PIPELINE_CANCEL_PREVIOUS is a server-level env var, not an agent setting. The file targets only list agent-side files (ci/main.tf for agent replicaCount). The ticket must identify where the Woodpecker server helm values or Terraform config lives and target that file for the auto-cancel setting. Without this, an implementing agent will put the env var in the wrong place or fail entirely.

2. Cross-repo blast radius unacknowledged (BLOCKER)
This is filed on landscaping-assistant but targets pal-e-platform. More critically, scaling agents and enabling auto-cancel affects every repo on the Woodpecker instance. The ticket does not acknowledge this blast radius or specify whether auto-cancel should be global (server env var) or per-repo (API setting). This decision materially changes the implementation.

3. Replica count not specified
"replicaCount >= 2" is vague. Should be a concrete number (2 or 3) with justification based on cluster resource headroom. The Constraints section says "consider cluster resource limits" but provides no data. The implementing agent cannot make this tradeoff.

What Passes

  • Problem statement is clear and well-evidenced (specific pipeline numbers cited).
  • Test expectations are observable and verifiable.
  • Lineage is correctly standalone.
  • Related issues linked appropriately.

VERDICT: REVISION NEEDED

Required fixes before implementation:

  1. Add the correct server-side file target for auto-cancel config.
  2. Decide global vs. per-repo auto-cancel and state that decision explicitly.
  3. Specify a concrete replica count (e.g., 2) with a note on cluster resource availability.
## Ticket #62 Scope Review ### Template Completeness All required sections present: Type, Lineage, Repo, User Story, Context, File Targets, Acceptance Criteria, Test Expectations, Constraints, Checklist, Related. No gaps. ### Findings Requiring Revision **1. Auto-cancel config location is wrong (BLOCKER)** `WOODPECKER_PIPELINE_CANCEL_PREVIOUS` is a **server-level** env var, not an agent setting. The file targets only list agent-side files (`ci/main.tf` for agent replicaCount). The ticket must identify where the Woodpecker **server** helm values or Terraform config lives and target that file for the auto-cancel setting. Without this, an implementing agent will put the env var in the wrong place or fail entirely. **2. Cross-repo blast radius unacknowledged (BLOCKER)** This is filed on landscaping-assistant but targets `pal-e-platform`. More critically, scaling agents and enabling auto-cancel affects **every repo** on the Woodpecker instance. The ticket does not acknowledge this blast radius or specify whether auto-cancel should be global (server env var) or per-repo (API setting). This decision materially changes the implementation. **3. Replica count not specified** "replicaCount >= 2" is vague. Should be a concrete number (2 or 3) with justification based on cluster resource headroom. The Constraints section says "consider cluster resource limits" but provides no data. The implementing agent cannot make this tradeoff. ### What Passes - Problem statement is clear and well-evidenced (specific pipeline numbers cited). - Test expectations are observable and verifiable. - Lineage is correctly standalone. - Related issues linked appropriately. ### VERDICT: REVISION NEEDED **Required fixes before implementation:** 1. Add the correct server-side file target for auto-cancel config. 2. Decide global vs. per-repo auto-cancel and state that decision explicitly. 3. Specify a concrete replica count (e.g., 2) with a note on cluster resource availability.
Sign in to join this conversation.
No labels
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
ldraney/landscaping-assistant#62
No description provided.