SOP update: add Helm rollback DB corruption to sop-ci-pipeline-recovery #242

Open
opened 2026-03-28 22:50:42 +00:00 by forgejo_admin · 0 comments

Type

Feature

Lineage

  • Board: board-pal-e-platform
  • Story: story:superuser-deploy
  • Arch: arch:ci-pipeline
  • Discovered from: #184 (Harbor connectivity timeout resolution)

Repo

forgejo_admin/pal-e-platform (SOP lives in pal-e-docs notes, but relates to platform CI)

User Story

As the platform operator, I need sop-ci-pipeline-recovery to cover Helm rollback DB corruption so future incidents have a documented recovery playbook.

Context

Issue #184 revealed a failure mode not covered by sop-ci-pipeline-recovery: Woodpecker DB becomes unreachable during a Helm rollback, causing server crash-loops, agent disconnection, and stale workflow records that produce phantom failures.

File Targets

  • pal-e-docs note: sop-ci-pipeline-recovery (via MCP update_block or create_block)

Acceptance Criteria

  • New section in sop-ci-pipeline-recovery note
  • Symptoms, root cause, recovery, prevention documented

Test Expectations

  • get_note_toc(slug="sop-ci-pipeline-recovery") shows new section heading
  • get_section returns complete content for the new section

Constraints

  • This is a Dottie task (docs only, no code)
  • Follow existing section format in the SOP

Checklist

  • Read current sop-ci-pipeline-recovery structure
  • Add section covering: symptoms, root cause, recovery, prevention
  • Verify content renders correctly
  • #184 — Harbor connectivity timeout (parent incident)
  • sop-ci-pipeline-recovery — target SOP note
### Type Feature ### Lineage - Board: board-pal-e-platform - Story: story:superuser-deploy - Arch: arch:ci-pipeline - Discovered from: #184 (Harbor connectivity timeout resolution) ### Repo `forgejo_admin/pal-e-platform` (SOP lives in pal-e-docs notes, but relates to platform CI) ### User Story As the platform operator, I need `sop-ci-pipeline-recovery` to cover Helm rollback DB corruption so future incidents have a documented recovery playbook. ### Context Issue #184 revealed a failure mode not covered by `sop-ci-pipeline-recovery`: Woodpecker DB becomes unreachable during a Helm rollback, causing server crash-loops, agent disconnection, and stale workflow records that produce phantom failures. ### File Targets - pal-e-docs note: `sop-ci-pipeline-recovery` (via MCP `update_block` or `create_block`) ### Acceptance Criteria - [ ] New section in sop-ci-pipeline-recovery note - [ ] Symptoms, root cause, recovery, prevention documented ### Test Expectations - [ ] `get_note_toc(slug="sop-ci-pipeline-recovery")` shows new section heading - [ ] `get_section` returns complete content for the new section ### Constraints - This is a Dottie task (docs only, no code) - Follow existing section format in the SOP ### Checklist - [ ] Read current sop-ci-pipeline-recovery structure - [ ] Add section covering: symptoms, root cause, recovery, prevention - [ ] Verify content renders correctly ### Related - #184 — Harbor connectivity timeout (parent incident) - `sop-ci-pipeline-recovery` — target SOP note
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
forgejo_admin/pal-e-platform#242
No description provided.