MTTR: failure event detection #7
Labels
No labels
domain:backend
domain:devops
domain:frontend
status:approved
status:in-progress
status:needs-fix
status:qa
type:bug
type:devops
type:feature
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
ldraney/pal-e-dora-exporter#7
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Type
Feature
Lineage
New issue — created as part of DORA measurement expansion (2026-06-13).
Dependencies
Depends on #6 (ArgoCD spike) for ArgoCD failure detection. This ticket scopes to Woodpecker pipeline failures only.
Repo
ldraney/pal-e-dora-exporterUser Story
As a platform operator
I want failure events detected and recorded
So that MTTR (Mean Time to Recovery) can be calculated from real data
What
Add failure event detection to the DORA exporter so true MTTR can be calculated. Scoped to Woodpecker pipeline failures only — ArgoCD failure detection is deferred to #6.
Why
The exporter currently only tracks
dora_deployment_last_success_timestamp. No failure events are captured, so true MTTR cannot be calculated — it requires knowing when a failure occurred and when the next success restored service.Context
Need to detect Woodpecker pipeline failures and emit failure timestamps. ArgoCD sync failure detection depends on #6 (ArgoCD integration) which has not landed yet. This closes the last gap in the four DORA metrics for Woodpecker-tracked deployments.
File Targets
src/collectors/woodpecker.py— add failure event tracking and failure timestamp metric (metrics are defined inline at module level in this file)Feature Flag
None required — new metric is additive and does not change existing behavior.
Acceptance Criteria
dora_deployment_failure_timestampor equivalent metricTest Expectations
Constraints
Checklist
Related
dora-metrics— project this affectsIssue #7 Template Review
TEMPLATE CONFORMANCE
Validated against
template-issue-featurefrom pal-e-docs.### Type-- present and valid ("Feature")### Lineage-- present and non-empty### Repo-- present and correct### User Story-- present, follows As a / I want / So that format### Context-- present with sufficient background### File Targets-- present (but see blockers below)### Feature Flag-- present ("None required" -- reasonable for additive metric)### Acceptance Criteria-- present with checkboxes### Test Expectations-- present with checkboxes### Constraints-- present and non-empty### Checklist-- present with checkboxes### Related-- presentExtra sections not in template:
### What,### Why-- these add useful context and are not a problem.BLOCKERS
1. File target
src/metrics.pydoes not exist and is not marked as a new file.The repo has no
src/metrics.py. The File Targets section says "add failure timestamp metric definition" as though the file already exists. If this is a new file to be created, say so explicitly (e.g.,src/metrics.py-- create -- define failure timestamp metric). If the metric definition belongs in an existing file, point to the correct one.2. File target
src/collectors/argocd.pydoes not exist -- dependency on #6 not called out.The issue hedges with "(if exists)" but the ArgoCD collector is the subject of issue #6 ("Close CFR gap: ArgoCD event integration"). This creates an implicit ordering dependency: if #6 ships first, this issue's ArgoCD scope is real; if not, it is dead code. This must be stated explicitly as a dependency or the ArgoCD acceptance criteria should be removed and deferred to a follow-up after #6 lands.
3. Related section has a vague cross-reference.
"ArgoCD event integration issue -- related capability" does not name the issue number. This should be
#6so agents and humans can trace the dependency. The template calls for specific cross-references, not descriptions.NITS
Missing "Files the agent should NOT touch" subsection. The template includes a negative-targeting section. Even a brief "N/A" or "None" is better than omission -- it signals the author considered scope boundaries.
Test Expectations missing run command. Template calls for a
Run command:line (e.g.,pytest tests/ -k test_failure). Currently no test directory exists in the repo at all -- the issue should note whether atests/directory needs to be created.No spec file names in Test Expectations. Template guidance says "names specific spec files." The tests are described generically ("Unit test: failure event detection from mock Woodpecker data") but should name target files (e.g.,
tests/test_woodpecker_failures.py).Acceptance criterion #4 is a Grafana concern, not a code deliverable. "MTTR calculable in Grafana as time between failure and next success" is a validation step, not something the exporter PR can satisfy. Consider moving it to a separate validation step or rephrasing as "metric is queryable via PromQL to calculate MTTR."
SCOPE ASSESSMENT
The scope is a single deployable unit IF the ArgoCD dependency is resolved. Without #6, this is purely Woodpecker failure detection + a new metric definition -- clean and focused. With the ArgoCD hedging left in, the scope is ambiguous and could expand unpredictably when #6 lands.
Recommendation: Scope this issue to Woodpecker failure detection only. Create a follow-up issue for ArgoCD failure detection that explicitly depends on #6.
VERDICT: NEEDS_REWORK
Three blockers must be addressed before this moves to next_up:
src/metrics.pyis a new file or fix the path#6Updated per QA review: fixed nonexistent file targets, scoped to Woodpecker failures only (ArgoCD deferred to #6), added explicit dependency declaration.
Issue #7 Template Review (Re-review)
Previous review flagged three blockers: nonexistent file targets (
src/metrics.py,src/collectors/argocd.py), implicit dependency on #6, and vague cross-references. This re-review verifies those are resolved.TEMPLATE CONFORMANCE
Feature)- [ ]checkbox format- [ ]checkbox formatPREVIOUS BLOCKER RESOLUTION
src/metrics.pydid not existsrc/collectors/woodpecker.py(HTTP 200 confirmed). Note clarifies metrics are defined inline at module level.src/collectors/argocd.pydid not exist### Dependenciessection: "Depends on #6 (ArgoCD spike) for ArgoCD failure detection. This ticket scopes to Woodpecker pipeline failures only."CONTENT QUALITY
Scope: Clean and focused. Title, User Story, What, Context, and File Targets all consistently scope to Woodpecker pipeline failures only. ArgoCD is explicitly out of scope with a clear pointer to #6.
File Targets: Single target (
src/collectors/woodpecker.py) verified to exist in the repo atmain. The note about inline metric definitions is helpful context for the implementing agent.Acceptance Criteria: Three concrete, testable criteria. The metric name suggestion (
dora_deployment_failure_timestamp) gives the implementer a starting point while the "or equivalent" leaves room for better naming.Test Expectations: Two unit tests specified. Both are concrete (mock Woodpecker data, metric emission verification).
Dependencies section: Not part of the standard feature template but adds clear value here. Explicitly scopes what is and is not covered.
BLOCKERS
None.
NITS
### Lineageformat of "Related toorg/repo #N" but the issue uses "New issue -- created as part of DORA measurement expansion." This is fine since this is a standalone issue, but could reference the DORA measurement project slug for traceability.### Whatand### Whysections are not part of the standard feature template. They add clarity here but are redundant with Context. Not a problem, just noting the deviation.pytest tests/ -k test_failure) per the template guidance. Minor omission.VERDICT: APPROVED