Bump Woodpecker agent parallel workflows from 1 to 4 #194

Closed
opened 2026-03-27 03:22:03 +00:00 by forgejo_admin · 1 comment

Type

Feature

Lineage

Discovered during incident #184 session — 6 simultaneous merges queued for ~24 minutes with single-pipeline capacity.

Repo

forgejo_admin/pal-e-platform

User Story

As a platform operator
I want CI pipelines to run concurrently
So that burst merge operations don't serialize into 20+ minute queues

Context

Archbox has 12 CPU cores and 128GB RAM. Current utilization: 15% CPU, 18% memory. The Woodpecker k8s agent runs at MAX_WORKFLOWS default of 1, meaning one pipeline at a time globally across all repos.

A typical pipeline (clone + test + Kaniko build) uses ~1-2 CPU and ~1GB. Running 4 concurrently would use ~6 CPU and ~4GB — still under 50% node utilization.

The bottleneck is artificial. The node has headroom for 4+ concurrent pipelines without contention.

File Targets

Files to modify:

  • terraform/main.tf — add WOODPECKER_MAX_WORKFLOWS = "4" to agent env block (line ~779)

Files NOT to touch:

  • Agent resource limits — the agent pod itself is lightweight; pipeline pods get their own resources
  • replicaCount — stay at 1, more replicas adds complexity without benefit at this scale

Acceptance Criteria

  • WOODPECKER_MAX_WORKFLOWS=4 in agent Helm values
  • tofu plan shows only agent env change
  • After apply: trigger 2+ pipelines simultaneously, verify both run concurrently (not queued)
  • No OOM or CPU throttling under concurrent load

Test Expectations

  • tofu validate passes
  • tofu plan -lock=false shows single Helm release change
  • Post-deploy: merge 2 PRs simultaneously, verify both pipelines start within seconds
  • Run command: tofu plan -lock=false in terraform/

Constraints

  • tofu plan must include -lock=false
  • Start with 4, not higher — validate stability before scaling further
  • Monitor node resources after deploy to confirm headroom

Checklist

  • PR opened
  • Tests pass
  • No unrelated changes
  • #184 — incident session where bottleneck was observed
  • #191 — agent label routing (same Helm values block)
  • project-pal-e-platform
### Type Feature ### Lineage Discovered during incident #184 session — 6 simultaneous merges queued for ~24 minutes with single-pipeline capacity. ### Repo `forgejo_admin/pal-e-platform` ### User Story As a platform operator I want CI pipelines to run concurrently So that burst merge operations don't serialize into 20+ minute queues ### Context Archbox has 12 CPU cores and 128GB RAM. Current utilization: 15% CPU, 18% memory. The Woodpecker k8s agent runs at `MAX_WORKFLOWS` default of 1, meaning one pipeline at a time globally across all repos. A typical pipeline (clone + test + Kaniko build) uses ~1-2 CPU and ~1GB. Running 4 concurrently would use ~6 CPU and ~4GB — still under 50% node utilization. The bottleneck is artificial. The node has headroom for 4+ concurrent pipelines without contention. ### File Targets Files to modify: - `terraform/main.tf` — add `WOODPECKER_MAX_WORKFLOWS = "4"` to agent env block (line ~779) Files NOT to touch: - Agent resource limits — the agent pod itself is lightweight; pipeline pods get their own resources - `replicaCount` — stay at 1, more replicas adds complexity without benefit at this scale ### Acceptance Criteria - [ ] `WOODPECKER_MAX_WORKFLOWS=4` in agent Helm values - [ ] `tofu plan` shows only agent env change - [ ] After apply: trigger 2+ pipelines simultaneously, verify both run concurrently (not queued) - [ ] No OOM or CPU throttling under concurrent load ### Test Expectations - [ ] `tofu validate` passes - [ ] `tofu plan -lock=false` shows single Helm release change - [ ] Post-deploy: merge 2 PRs simultaneously, verify both pipelines start within seconds - Run command: `tofu plan -lock=false` in `terraform/` ### Constraints - `tofu plan` must include `-lock=false` - Start with 4, not higher — validate stability before scaling further - Monitor node resources after deploy to confirm headroom ### Checklist - [ ] PR opened - [ ] Tests pass - [ ] No unrelated changes ### Related - #184 — incident session where bottleneck was observed - #191 — agent label routing (same Helm values block) - `project-pal-e-platform`
Author
Owner

Scope Review: READY

Review note: review-432-2026-03-26
Scope is solid — single env var addition to agent Helm values block at terraform/main.tf:779. File target verified, traceability complete (story:superuser-deploy, arch:ci-pipeline), no blocking dependencies. #191 touches the same block but different env vars — rebase if it merges first.

## Scope Review: READY Review note: `review-432-2026-03-26` Scope is solid — single env var addition to agent Helm values block at terraform/main.tf:779. File target verified, traceability complete (story:superuser-deploy, arch:ci-pipeline), no blocking dependencies. #191 touches the same block but different env vars — rebase if it merges first.
forgejo_admin 2026-03-27 03:28:47 +00:00
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
forgejo_admin/pal-e-platform#194
No description provided.