feat: Woodpecker agent label routing + retry count (#191) #192

Merged
forgejo_admin merged 2 commits from 191-woodpecker-agent-label-routing into main 2026-03-27 01:57:54 +00:00

Summary

  • Add platform=linux filter label to k8s Woodpecker agent and pipeline label constraint to .woodpecker.yaml
  • Bump WOODPECKER_CONNECT_RETRY_COUNT from 1 to 10 for restart cascade resilience
  • Establishes routing contract so Mac agent (local backend) can't steal Linux/container pipelines

Changes

  • terraform/main.tf: Added WOODPECKER_FILTER_LABELS=platform=linux and WOODPECKER_CONNECT_RETRY_COUNT=10 to k8s agent Helm values
  • .woodpecker.yaml: Added top-level labels: { platform: linux } to constrain pipeline routing

Test Plan

  • tofu fmt -check main.tf passes
  • tofu validate passes
  • tofu plan -lock=false in CI (cannot run locally — MinIO DNS is cluster-internal)
  • After apply: verify k8s agent (id=1) picks up labeled pipeline
  • After apply: verify Mac agent (id=3, currently disabled) does NOT pick up labeled pipeline
  • No regressions — unlabeled pipelines from other repos still route to k8s agent

Review Checklist

  • No secrets committed
  • No unnecessary file changes (2 files, 5 lines added)
  • Commit messages are descriptive
  • Passed automated review-fix loop
  • Closes #191
  • Related: #184 — incident that exposed the routing gap
  • Related: #174 — Mac build agent setup (next_up)
  • project-pal-e-platform

Root Cause Investigation

Woodpecker filter labels are capability advertisements, not restrictions. Agent custom_labels = {"platform":"darwin"} means "I can handle darwin jobs AND unlabeled jobs." Without pipeline labels:, ALL agents race for every job. DB evidence: all 10+ recent failures were agent_id=3 (Mac) grabbing unlabeled workflows.

Sequencing

Agent filter labels (widen) deploy safely before all repos add pipeline labels (narrow). Unlabeled pipelines still match any agent including one with platform=linux. Cross-repo label updates are follow-up work.

Follow-up

  • Cross-repo pipeline label PRs (basketball-api, pal-e-deployments, pal-e-app, etc.)
  • Convention note: convention-pipeline-labels
  • Re-enable Mac agent after #174 + cross-repo labels complete
## Summary - Add `platform=linux` filter label to k8s Woodpecker agent and pipeline label constraint to `.woodpecker.yaml` - Bump `WOODPECKER_CONNECT_RETRY_COUNT` from 1 to 10 for restart cascade resilience - Establishes routing contract so Mac agent (local backend) can't steal Linux/container pipelines ## Changes - `terraform/main.tf`: Added `WOODPECKER_FILTER_LABELS=platform=linux` and `WOODPECKER_CONNECT_RETRY_COUNT=10` to k8s agent Helm values - `.woodpecker.yaml`: Added top-level `labels: { platform: linux }` to constrain pipeline routing ## Test Plan - [x] `tofu fmt -check main.tf` passes - [x] `tofu validate` passes - [ ] `tofu plan -lock=false` in CI (cannot run locally — MinIO DNS is cluster-internal) - [ ] After apply: verify k8s agent (id=1) picks up labeled pipeline - [ ] After apply: verify Mac agent (id=3, currently disabled) does NOT pick up labeled pipeline - [ ] No regressions — unlabeled pipelines from other repos still route to k8s agent ## Review Checklist - [x] No secrets committed - [x] No unnecessary file changes (2 files, 5 lines added) - [x] Commit messages are descriptive - [ ] Passed automated review-fix loop ## Related Notes - Closes #191 - Related: #184 — incident that exposed the routing gap - Related: #174 — Mac build agent setup (next_up) - `project-pal-e-platform` ### Root Cause Investigation Woodpecker filter labels are capability advertisements, not restrictions. Agent `custom_labels = {"platform":"darwin"}` means "I can handle darwin jobs AND unlabeled jobs." Without pipeline `labels:`, ALL agents race for every job. DB evidence: all 10+ recent failures were agent_id=3 (Mac) grabbing unlabeled workflows. ### Sequencing Agent filter labels (widen) deploy safely before all repos add pipeline labels (narrow). Unlabeled pipelines still match any agent including one with `platform=linux`. Cross-repo label updates are follow-up work. ### Follow-up - Cross-repo pipeline label PRs (basketball-api, pal-e-deployments, pal-e-app, etc.) - Convention note: `convention-pipeline-labels` - Re-enable Mac agent after #174 + cross-repo labels complete
feat: add Woodpecker agent label routing + bump retry count
Some checks are pending
ci/woodpecker/push/woodpecker Pipeline is pending
ci/woodpecker/pr/woodpecker Pipeline is pending
be40838f35
Add platform=linux filter label to k8s agent and pipeline label constraint
to .woodpecker.yaml. This establishes a routing contract so pipelines are
directed to capable agents — prevents the Mac agent (local backend) from
stealing Linux/container jobs.

Also bumps WOODPECKER_CONNECT_RETRY_COUNT from 1 to 10 to survive pod
restart cascades (DB → server → agent ordering).

Refs: #191

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Author
Owner

PR #192 Review

DOMAIN REVIEW

Tech stack: Terraform (Helm provider) + Woodpecker CI YAML. Applying Terraform/k8s/Helm checklist.

Terraform changes (terraform/main.tf lines 784-785):

  • WOODPECKER_FILTER_LABELS = "platform=linux" -- Correct env var name for Woodpecker agent label advertisement. Value format key=value matches Woodpecker documentation. Aligns with the Mac agent's complementary filter_labels: "platform=darwin" in salt/pillar/mac-agent.sls:21.
  • WOODPECKER_CONNECT_RETRY_COUNT = "10" -- Bumps from default (1) to 10 for restart cascade resilience. Reasonable value for a k8s environment where server pods may restart. String type is correct for Helm env var injection.
  • Both values are placed correctly within the agent.env block alongside existing WOODPECKER_BACKEND_* vars. Alignment is consistent with existing formatting (tofu fmt compliant).
  • No state-breaking changes. These are additive env vars on an existing Helm release -- Helm will do a rolling update of the agent pod. Safe to apply.

Woodpecker pipeline changes (.woodpecker.yaml lines 1-2):

  • Top-level labels: { platform: linux } constrains this repo's pipelines to agents advertising platform=linux. Correct Woodpecker YAML schema for pipeline label routing.
  • Placement before clone: is correct -- labels are a top-level pipeline directive.

Routing logic verification:

  • PR body correctly explains that filter labels are capability advertisements, not restrictions. An agent with platform=linux will still accept unlabeled pipelines from other repos. This is the safe sequencing: agent labels first, then cross-repo pipeline labels as follow-up.
  • The Mac agent (salt/pillar/mac-agent.sls) has filter_labels: "platform=darwin", confirming the routing contract is consistent.

BLOCKERS

None.

  • No new functionality requiring tests (infrastructure config change, not application code).
  • No user input handling.
  • No secrets or credentials introduced.
  • No duplicated auth/security logic.

NITS

  1. Retry count as a magic number: "10" is reasonable but undocumented. Consider adding an inline comment explaining why 10 was chosen (e.g., # 10 retries covers ~30s server restart window). Very minor -- the PR body documents the rationale.

  2. Cross-repo follow-up tracking: The PR body mentions cross-repo pipeline label PRs as follow-up. Confirm these are tracked as Forgejo issues so they don't get lost. Without them, other repos' unlabeled pipelines will still be raceable by both agents once the Mac agent is re-enabled.

SOP COMPLIANCE

  • Branch named after issue: 191-woodpecker-agent-label-routing references issue #191
  • PR body has Summary, Changes, Test Plan, Related sections
  • Related references project: project-pal-e-platform
  • No secrets committed in this diff
  • No unnecessary file changes (2 files, 5 lines added -- tightly scoped)
  • Commit messages are descriptive
  • tofu plan -lock=false included in test plan (CI will run it)
  • PR body includes root cause investigation and sequencing rationale -- thorough

PROCESS OBSERVATIONS

  • Change failure risk: LOW. Additive env vars on an existing Helm release. Worst case: agent restart with new labels, falls back to accepting all unlabeled jobs (current behavior). No destructive path.
  • Deployment frequency: POSITIVE. This unblocks the Mac agent work (#174) by establishing the routing contract first. Correct sequencing.
  • Pre-existing concern (out of scope for this PR): salt/pillar/mac-agent.sls:18 contains a plaintext agent token (woodpecker_agent_secret). This is not introduced by this PR but is worth tracking as a separate security hardening issue (move to Salt encrypted pillar or external secret store).
  • Documentation gap: The PR body mentions a future convention-pipeline-labels note. This should be created when the cross-repo label work begins, not deferred indefinitely.

VERDICT: APPROVED

## PR #192 Review ### DOMAIN REVIEW **Tech stack:** Terraform (Helm provider) + Woodpecker CI YAML. Applying Terraform/k8s/Helm checklist. **Terraform changes (`terraform/main.tf` lines 784-785):** - `WOODPECKER_FILTER_LABELS = "platform=linux"` -- Correct env var name for Woodpecker agent label advertisement. Value format `key=value` matches Woodpecker documentation. Aligns with the Mac agent's complementary `filter_labels: "platform=darwin"` in `salt/pillar/mac-agent.sls:21`. - `WOODPECKER_CONNECT_RETRY_COUNT = "10"` -- Bumps from default (1) to 10 for restart cascade resilience. Reasonable value for a k8s environment where server pods may restart. String type is correct for Helm env var injection. - Both values are placed correctly within the `agent.env` block alongside existing `WOODPECKER_BACKEND_*` vars. Alignment is consistent with existing formatting (`tofu fmt` compliant). - No state-breaking changes. These are additive env vars on an existing Helm release -- Helm will do a rolling update of the agent pod. Safe to apply. **Woodpecker pipeline changes (`.woodpecker.yaml` lines 1-2):** - Top-level `labels: { platform: linux }` constrains this repo's pipelines to agents advertising `platform=linux`. Correct Woodpecker YAML schema for pipeline label routing. - Placement before `clone:` is correct -- labels are a top-level pipeline directive. **Routing logic verification:** - PR body correctly explains that filter labels are capability advertisements, not restrictions. An agent with `platform=linux` will still accept unlabeled pipelines from other repos. This is the safe sequencing: agent labels first, then cross-repo pipeline labels as follow-up. - The Mac agent (`salt/pillar/mac-agent.sls`) has `filter_labels: "platform=darwin"`, confirming the routing contract is consistent. ### BLOCKERS None. - No new functionality requiring tests (infrastructure config change, not application code). - No user input handling. - No secrets or credentials introduced. - No duplicated auth/security logic. ### NITS 1. **Retry count as a magic number:** `"10"` is reasonable but undocumented. Consider adding an inline comment explaining why 10 was chosen (e.g., `# 10 retries covers ~30s server restart window`). Very minor -- the PR body documents the rationale. 2. **Cross-repo follow-up tracking:** The PR body mentions cross-repo pipeline label PRs as follow-up. Confirm these are tracked as Forgejo issues so they don't get lost. Without them, other repos' unlabeled pipelines will still be raceable by both agents once the Mac agent is re-enabled. ### SOP COMPLIANCE - [x] Branch named after issue: `191-woodpecker-agent-label-routing` references issue #191 - [x] PR body has Summary, Changes, Test Plan, Related sections - [x] Related references project: `project-pal-e-platform` - [x] No secrets committed in this diff - [x] No unnecessary file changes (2 files, 5 lines added -- tightly scoped) - [x] Commit messages are descriptive - [x] `tofu plan -lock=false` included in test plan (CI will run it) - [x] PR body includes root cause investigation and sequencing rationale -- thorough ### PROCESS OBSERVATIONS - **Change failure risk: LOW.** Additive env vars on an existing Helm release. Worst case: agent restart with new labels, falls back to accepting all unlabeled jobs (current behavior). No destructive path. - **Deployment frequency: POSITIVE.** This unblocks the Mac agent work (#174) by establishing the routing contract first. Correct sequencing. - **Pre-existing concern (out of scope for this PR):** `salt/pillar/mac-agent.sls:18` contains a plaintext agent token (`woodpecker_agent_secret`). This is not introduced by this PR but is worth tracking as a separate security hardening issue (move to Salt encrypted pillar or external secret store). - **Documentation gap:** The PR body mentions a future `convention-pipeline-labels` note. This should be created when the cross-repo label work begins, not deferred indefinitely. ### VERDICT: APPROVED
fix: remove pipeline labels to unblock CI bootstrap
All checks were successful
ci/woodpecker/push/woodpecker Pipeline was successful
ci/woodpecker/pr/woodpecker Pipeline was successful
ci/woodpecker/pull_request_closed/woodpecker Pipeline was successful
389cb3b558
Pipeline labels require a matching agent capability that doesn't exist
yet — the Terraform apply that gives the k8s agent platform=linux hasn't
run. Adding labels and agent config in the same PR creates a chicken-and-egg
deadlock: CI can't run because no agent matches, so the fix never deploys.

Pipeline labels will be added in a follow-up PR after this Terraform
change applies and the agent advertises platform=linux.

Refs: #191

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
forgejo_admin deleted branch 191-woodpecker-agent-label-routing 2026-03-27 01:57:54 +00:00
Sign in to join this conversation.
No description provided.