SOP: Harbor robot import recovery — find the real robot ID without breaking live infra #274
Labels
No labels
domain:backend
domain:devops
domain:frontend
status:approved
status:in-progress
status:needs-fix
status:qa
type:bug
type:devops
type:feature
No milestone
No project
No assignees
1 participant
Due date
No due date set.
Dependencies
No dependencies set.
Reference
ldraney/pal-e-api#274
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Type
Feature
Lineage
Standalone — captured 2026-04-26 after a near-miss during apply-backlog reconciliation where the wrong Harbor robot was imported into terraform state, briefly putting
westsidekingsandqueens-pullat risk of deletion.Repo
forgejo_admin/pal-e-api(deliverable is a note in pal-e-docs, owned by this project)User Story
As a platform operator recovering from a Harbor 409 conflict during
tofu applyI want a documented procedure for finding the correct robot ID and importing safely
So that I don't misread Harbor's error messages and import the wrong robot, which would mark a live robot for replacement and break image pulls for an unrelated service.
Context
On 2026-04-26 a
tofu applyreturnedError: 409 robot account 27:playme2k+playme2k-ci already existswhile attempting to create the playme2k CI robot. The27in that message is the Harbor project ID, not the robot ID — but the format invites misreading. The operator imported/robots/27into state on the assumption it was the robot ID. That ID actually belongs towestsidekingsandqueens-pull, which then had two terraform resources pointing to it. The next plan reportedmust be replacedwith a name change fromwestsidekingsandqueens-pulltoplayme2k-ci, which would have destroyed the live pull robot and broken westsidekingsandqueens image pulls. Caught on plan inspection before apply, recovered viatofu state rm.Additional finding: the paginated
GET /api/v2.0/robots?page=N&page_size=100endpoint returns system-level robots only. Project-scoped robots (which is what every service uses for CI and pull access) do not appear in that listing. DirectGET /api/v2.0/robots/{id}lookups do return them. The real playme2k-ci robot was found by scanning IDs 1-400.This recovery pattern is not covered by any existing SOP.
service-onboarding-sopcovers happy-path onboarding but not import recovery.File Targets
sop-harbor-robot-import, taggedsop,active.sop-indexto reference the new SOP.Files the agent should NOT touch:
service-onboarding-sop— happy-path onboarding belongs there; recovery belongs in its own note.pal-e-services— separate ticket.Acceptance Criteria
<project_id>:<robot_full_name>is misleading —<project_id>is NOT the robot ID./robotslisting hides project-scoped robots.tofu planafter import; if it showsmust be replaced, immediatelytofu state rmto back out.sop-indexreferences the new SOP.Procedure (content the SOP must include)
Error: 409 robot account <X>:<robot-name> already exists, treat<X>as a Harbor project ID and ignore it for import purposes./api/v2.0/robots/{id}for an ID range, filtering by name match for<robot-name>. The robot'snameisrobot$<project>+<robot-name>.tofu import -var-file=k3s.tfvars 'harbor_robot_account.service_ci["<service>"]' '/robots/<real-id>'.tofu plan -var-file=k3s.tfvars -lock=false. If output showsmust be replacedor any~ name = "..." -> "..."for the imported resource, the wrong robot was imported. Runtofu state rm 'harbor_robot_account.service_ci["<service>"]'immediately to back out (this only edits state, does not touch the live robot).Test Expectations
list_notes(tags="sop,active")returns the new SOP.sop-indexdisplays the new entry.Constraints
feedback_never_alter_prod_directlyand the import command formats fromservice-onboarding-sop.Checklist
sop,activeRelated
pal-e-docs— project this affectspal-e-platform— recovery applies to platform terraform operationsScope Review: APPROVED
Review note:
review-1108-2026-04-26Scope is solid — comprehensive issue body, traceability triangle intact (story:superuser-recover verified on project-pal-e-platform), zero blast radius, fits the 5-minute rule. The Procedure section already contains the SOP content; dev agent only needs to structure it into the standard SOP shape and add the sop-index entry.
Non-blocking observations:
[SCOPE]arch:harbor — noarch-harbornote in pal-e-docs (platform-wide gap, separate ticket)[SCOPE]arch:terraform — noarch-terraformnote in pal-e-docs (platform-wide gap, separate ticket)Foundational doc work ships without complete arch backing notes — gating SOPs on missing arch notes would block all platform documentation. Ready to advance to next_up.
SOP delivered.
SOP note created:
sop-harbor-robot-import—note_type: sop,status: active, project:pal-e-platform, taggedsop,active.Sections (matches
template-soprequirements):tofu state rmback-out procedureservice-onboarding-sop,sop-platform-tf-changes,sop-incident-response, plus the underlying behavioral feedback memoriesDiscoverability verified:
list_notes(tags="sop,active")returns the new SOP.sop-indexupdated: new entry added to the "Deployment & Onboarding" table (now 7 rows).Acceptance criteria status: all 5 from the issue body satisfied:
/robotsgap documented in Backgroundsop-indexreferences the new SOPBoard item #1108 advancing to
needs_approvalfor your sign-off.