Time-window MacAgentDown alert (laptop offline outside work hours is expected) #326
Labels
No labels
domain:backend
domain:devops
domain:frontend
status:approved
status:in-progress
status:needs-fix
status:qa
type:bug
type:devops
type:feature
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
ldraney/pal-e-platform#326
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Type
Bug
Lineage
Standalone — discovered 2026-05-01 during alert-state audit.
Repo
forgejo_admin/pal-e-platformWhat Broke
Three alerts fire continuously for one offline laptop, training oncall to ignore Mac alerts. This is dangerous because we'll also ignore them when the Mac is genuinely broken (Apple developer enrollment lapse, cert expiry, dead disk). Currently:
MacAgentDown(critical) firing 16dTargetDown(warning, our custom rule, instance=lucass-macbook-air-1) firing 16dTargetDown(warning, helm-default aggregate) firing 34dThe Mac is a personal laptop and is expected to be offline outside ~8am–9pm MST weekdays.
Repro Steps
Expected Behavior
for: 6hso transient overnight offlines never trigger.TargetDown(helm-default vs our custom version) is de-duplicated.Environment
monitoringterraform/modules/monitoring/main.tf,kube-prometheus-stack-platform-alertsPrometheusRule,mac-agent-healthgroupWebhookStalealready useshour()/day_of_week()filtersAcceptance Criteria
MacAgentDowndoes not fire outside work hours when laptop is closedMacAgentDownstill fires within 5 minutes when laptop is broken during work hoursTargetDown(custom or helm-default, pick one) is removedRelated
pal-e-platform— projectalert-report-2026-05-01— alert snapshot