Bug: CI build-and-push fails with Harbor connectivity timeout from Woodpecker agent #184
Labels
No labels
domain:backend
domain:devops
domain:frontend
status:approved
status:in-progress
status:needs-fix
status:qa
type:bug
type:devops
type:feature
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
forgejo_admin/pal-e-platform#184
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Type
Bug
Lineage
standalone — discovered during CI pipeline monitoring for basketball-api PRs #172/#174
Repo
forgejo_admin/pal-e-platformWhat Broke
Woodpecker CI pipeline
build-and-pushstep fails with Harbor registry connectivity timeout. Kaniko attempts HTTPS (port 443) on theharborClusterIP service which only exposes port 80, then falls back to HTTP which gets connection refused.On retry (pipeline #146), the postgres service container also failed to start (empty logs), suggesting broader CI agent resource pressure or networking issues.
Repro Steps
basketball-apimain (or any repo with Woodpecker CI build-and-push)build-and-pushstep times out connecting to HarborExpected Behavior
Kaniko should successfully push images to Harbor via
harbor.harbor.svc.cluster.local:80. All Harbor pods are Running, services are up. This worked previously.Environment
harbor,woodpecker10.43.131.178, port 80/TCP only (no 443)Acceptance Criteria
Related
forgejo_admin/basketball-api#170— jersey sync fix waiting on deployforgejo_admin/basketball-api#173— teams/save fix waiting on deployfeedback_ci_pipeline_lessons.md— 12 prior CI root causes (Harbor hairpin was one)sop-ci-pipeline-recovery— CI recovery SOPScope Review: NEEDS_REFINEMENT
Review note:
review-411-2026-03-26Scope document is strong on context (error output, environment, repro steps) but missing 5 template sections needed for agent execution.
terraform/network-policies.tf,terraform/main.tf,basketball-api/.woodpecker.yamlbackend: local, board #391) cannot run Kaniko containers. If Woodpecker misrouted pipeline #145 to the Mac agent, that is the root cause -- not network policy. Check agent assignment before investigating network layer.sop-ci-pipeline-recoverydoes not cover connectivity timeout failure mode (port 443 on HTTP-only service). Update SOP after root cause confirmed.Root Cause Investigation (2026-03-26)
Original hypothesis was Harbor connectivity / Mac agent routing. Both ruled out.
Actual root cause: Woodpecker server DB state corruption
woodpecker-db-1) became briefly unreachablefailed to setup store: dial tcp 10.43.54.87:5432: connection refused:9000refused)queue.Done: cannot ack workflow -- sql: no rows in result setdone: cannot close log stream for step N -- stream: not foundVerified NOT the cause
filter_labels: "platform=darwin", basketball-api pipelines have no labels. Agent ran on k8s (confirmed by clone usingforgejo-http.forgejo.svc.cluster.local)postgres:16-alpinestarts fine manually in woodpecker namespaceFix needed
Restart Woodpecker DB → server → agent in order, or investigate
woodpecker-db-1for corrupted workflow/step records. The SOP (sop-ci-pipeline-recovery) should be updated with this failure mode.Incident Escalation: Woodpecker DB corruption
Original scope was Harbor connectivity. Root cause is deeper:
What's Actually Broken
sql: no rows in result seterrors onqueue.DoneSeverity
P2 — CI Degraded. All pipelines are broken. Services that need CI builds are blocked.
Related
Options