feat: Woodpecker agent label routing + retry count (#191) #192
No reviewers
Labels
No labels
domain:backend
domain:devops
domain:frontend
status:approved
status:in-progress
status:needs-fix
status:qa
type:bug
type:devops
type:feature
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
forgejo_admin/pal-e-platform!192
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "191-woodpecker-agent-label-routing"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Summary
platform=linuxfilter label to k8s Woodpecker agent and pipeline label constraint to.woodpecker.yamlWOODPECKER_CONNECT_RETRY_COUNTfrom 1 to 10 for restart cascade resilienceChanges
terraform/main.tf: AddedWOODPECKER_FILTER_LABELS=platform=linuxandWOODPECKER_CONNECT_RETRY_COUNT=10to k8s agent Helm values.woodpecker.yaml: Added top-levellabels: { platform: linux }to constrain pipeline routingTest Plan
tofu fmt -check main.tfpassestofu validatepassestofu plan -lock=falsein CI (cannot run locally — MinIO DNS is cluster-internal)Review Checklist
Related Notes
project-pal-e-platformRoot Cause Investigation
Woodpecker filter labels are capability advertisements, not restrictions. Agent
custom_labels = {"platform":"darwin"}means "I can handle darwin jobs AND unlabeled jobs." Without pipelinelabels:, ALL agents race for every job. DB evidence: all 10+ recent failures were agent_id=3 (Mac) grabbing unlabeled workflows.Sequencing
Agent filter labels (widen) deploy safely before all repos add pipeline labels (narrow). Unlabeled pipelines still match any agent including one with
platform=linux. Cross-repo label updates are follow-up work.Follow-up
convention-pipeline-labelsPR #192 Review
DOMAIN REVIEW
Tech stack: Terraform (Helm provider) + Woodpecker CI YAML. Applying Terraform/k8s/Helm checklist.
Terraform changes (
terraform/main.tflines 784-785):WOODPECKER_FILTER_LABELS = "platform=linux"-- Correct env var name for Woodpecker agent label advertisement. Value formatkey=valuematches Woodpecker documentation. Aligns with the Mac agent's complementaryfilter_labels: "platform=darwin"insalt/pillar/mac-agent.sls:21.WOODPECKER_CONNECT_RETRY_COUNT = "10"-- Bumps from default (1) to 10 for restart cascade resilience. Reasonable value for a k8s environment where server pods may restart. String type is correct for Helm env var injection.agent.envblock alongside existingWOODPECKER_BACKEND_*vars. Alignment is consistent with existing formatting (tofu fmtcompliant).Woodpecker pipeline changes (
.woodpecker.yamllines 1-2):labels: { platform: linux }constrains this repo's pipelines to agents advertisingplatform=linux. Correct Woodpecker YAML schema for pipeline label routing.clone:is correct -- labels are a top-level pipeline directive.Routing logic verification:
platform=linuxwill still accept unlabeled pipelines from other repos. This is the safe sequencing: agent labels first, then cross-repo pipeline labels as follow-up.salt/pillar/mac-agent.sls) hasfilter_labels: "platform=darwin", confirming the routing contract is consistent.BLOCKERS
None.
NITS
Retry count as a magic number:
"10"is reasonable but undocumented. Consider adding an inline comment explaining why 10 was chosen (e.g.,# 10 retries covers ~30s server restart window). Very minor -- the PR body documents the rationale.Cross-repo follow-up tracking: The PR body mentions cross-repo pipeline label PRs as follow-up. Confirm these are tracked as Forgejo issues so they don't get lost. Without them, other repos' unlabeled pipelines will still be raceable by both agents once the Mac agent is re-enabled.
SOP COMPLIANCE
191-woodpecker-agent-label-routingreferences issue #191project-pal-e-platformtofu plan -lock=falseincluded in test plan (CI will run it)PROCESS OBSERVATIONS
salt/pillar/mac-agent.sls:18contains a plaintext agent token (woodpecker_agent_secret). This is not introduced by this PR but is worth tracking as a separate security hardening issue (move to Salt encrypted pillar or external secret store).convention-pipeline-labelsnote. This should be created when the cross-repo label work begins, not deferred indefinitely.VERDICT: APPROVED