Mac build agent — Salt managed with observability #174

Closed
opened 2026-03-26 16:03:12 +00:00 by forgejo_admin · 3 comments

Type

Feature

Lineage

project-pal-e-platform → Board item. Supersedes manual Mac setup from #166.

Repo

forgejo_admin/pal-e-platform

User Story

As Lucas
I want the Mac build agent managed by Salt with logs and metrics in Grafana
So that the Mac is reproducible infrastructure, not a manually configured laptop

Context

The Mac (lucass-macbook-air-1) is now a CI build machine for iOS builds. Current setup was done manually via SSH — fragile, not reproducible, no observability. This ticket brings it under Salt management with full observability.

Three sub-deliverables:

  1. Salt minion on Mac — managed by Salt master on archbox via Tailscale
  2. Woodpecker agent as Salt state — binary, plist, config all declarative
  3. Observability — Promtail (logs → Loki), node-exporter (metrics → Prometheus), Blackbox probe (uptime)

File Targets

Files to create:

  • salt/states/mac-agent/init.sls — orchestrator
  • salt/states/mac-agent/woodpecker.sls — agent binary + launchd plist
  • salt/states/mac-agent/observability.sls — promtail + node-exporter launchd services
  • salt/states/mac-agent/homebrew.sls — managed packages (node, fastlane)
  • salt/states/mac-agent/magicdns.sls — verify Tailscale MagicDNS config
  • salt/pillar/mac-agent/init.sls — secrets (agent token, Loki URL)

Files to modify:

  • salt/pillar/top.sls — add Mac minion targeting
  • terraform/main.tf — Prometheus scrape config for Mac node-exporter (Tailscale IP)
  • Grafana — dashboard for Mac agent build metrics

Files NOT to touch:

  • Existing Salt states for archbox
  • Woodpecker Helm chart (server-side config unchanged)

Acceptance Criteria

  • Salt minion on Mac connects to master on archbox
  • salt 'lucass-macbook-air-1' state.apply mac-agent provisions everything
  • Woodpecker agent starts and connects (visible in Woodpecker admin)
  • Agent logs appear in Grafana (Loki data source, label: host=lucass-macbook-air-1)
  • Mac CPU/memory metrics appear in Grafana (Prometheus)
  • Blackbox probe alerts if Mac agent is unreachable for >5min
  • MagicDNS resolves woodpecker.tail5b443a.ts.net on the Mac

Test Expectations

  • salt 'lucass-macbook-air-1' test.ping returns True
  • salt 'lucass-macbook-air-1' state.apply mac-agent test=True shows no errors
  • Grafana query: {host="lucass-macbook-air-1"} returns agent logs
  • Prometheus query: up{instance=~".*macbook.*"} returns 1

Constraints

  • Salt communication over Tailscale (master at 100.110.151.59)
  • No sudo for initial Salt install — may need Lucas to run brew install salt-minion once
  • Secrets in Salt pillar, never in states
  • Promtail and node-exporter installed via Homebrew, managed via launchd (same pattern as Woodpecker agent)
  • MagicDNS must work before agent can use DNS names instead of IPs

Checklist

  • PR opened
  • Salt states tested with test=True
  • Observability verified in Grafana
  • No unrelated changes
  • pal-e-platform #166 — Mac CI agent setup (manual, superseded by this)
  • pal-e-platform #172 — gRPC funnel (may not be needed if MagicDNS works)
  • project-capacitor-mobile — consumer of the Mac build agent
### Type Feature ### Lineage `project-pal-e-platform` → Board item. Supersedes manual Mac setup from #166. ### Repo `forgejo_admin/pal-e-platform` ### User Story As Lucas I want the Mac build agent managed by Salt with logs and metrics in Grafana So that the Mac is reproducible infrastructure, not a manually configured laptop ### Context The Mac (lucass-macbook-air-1) is now a CI build machine for iOS builds. Current setup was done manually via SSH — fragile, not reproducible, no observability. This ticket brings it under Salt management with full observability. Three sub-deliverables: 1. **Salt minion on Mac** — managed by Salt master on archbox via Tailscale 2. **Woodpecker agent as Salt state** — binary, plist, config all declarative 3. **Observability** — Promtail (logs → Loki), node-exporter (metrics → Prometheus), Blackbox probe (uptime) ### File Targets Files to create: - `salt/states/mac-agent/init.sls` — orchestrator - `salt/states/mac-agent/woodpecker.sls` — agent binary + launchd plist - `salt/states/mac-agent/observability.sls` — promtail + node-exporter launchd services - `salt/states/mac-agent/homebrew.sls` — managed packages (node, fastlane) - `salt/states/mac-agent/magicdns.sls` — verify Tailscale MagicDNS config - `salt/pillar/mac-agent/init.sls` — secrets (agent token, Loki URL) Files to modify: - `salt/pillar/top.sls` — add Mac minion targeting - `terraform/main.tf` — Prometheus scrape config for Mac node-exporter (Tailscale IP) - Grafana — dashboard for Mac agent build metrics Files NOT to touch: - Existing Salt states for archbox - Woodpecker Helm chart (server-side config unchanged) ### Acceptance Criteria - [ ] Salt minion on Mac connects to master on archbox - [ ] `salt 'lucass-macbook-air-1' state.apply mac-agent` provisions everything - [ ] Woodpecker agent starts and connects (visible in Woodpecker admin) - [ ] Agent logs appear in Grafana (Loki data source, label: host=lucass-macbook-air-1) - [ ] Mac CPU/memory metrics appear in Grafana (Prometheus) - [ ] Blackbox probe alerts if Mac agent is unreachable for >5min - [ ] MagicDNS resolves woodpecker.tail5b443a.ts.net on the Mac ### Test Expectations - [ ] `salt 'lucass-macbook-air-1' test.ping` returns True - [ ] `salt 'lucass-macbook-air-1' state.apply mac-agent test=True` shows no errors - [ ] Grafana query: `{host="lucass-macbook-air-1"}` returns agent logs - [ ] Prometheus query: `up{instance=~".*macbook.*"}` returns 1 ### Constraints - Salt communication over Tailscale (master at 100.110.151.59) - No sudo for initial Salt install — may need Lucas to run `brew install salt-minion` once - Secrets in Salt pillar, never in states - Promtail and node-exporter installed via Homebrew, managed via launchd (same pattern as Woodpecker agent) - MagicDNS must work before agent can use DNS names instead of IPs ### Checklist - [ ] PR opened - [ ] Salt states tested with `test=True` - [ ] Observability verified in Grafana - [ ] No unrelated changes ### Related - pal-e-platform #166 — Mac CI agent setup (manual, superseded by this) - pal-e-platform #172 — gRPC funnel (may not be needed if MagicDNS works) - `project-capacitor-mobile` — consumer of the Mac build agent
Author
Owner

Ticket Scope Review: #174

TEMPLATE COMPLIANCE (template-issue)

The issue follows the template-issue format closely. Section-by-section:

Section Present Notes
### Type Yes (extra) Not in the template, but useful metadata. No issue.
### Lineage Yes References project-pal-e-platform and supersedes #166. Path 2 (board-driven), so no plan phase ancestry required.
### Repo Yes forgejo_admin/pal-e-platform
### User Story Yes Proper As/I want/So that format.
### Context Yes Thorough. Three sub-deliverables clearly enumerated.
### File Targets Yes Excellent -- create/modify/NOT-touch lists with specific paths.
### Acceptance Criteria Yes 7 checkboxes, all testable.
### Test Expectations Yes 4 checkboxes with specific commands/queries.
### Constraints Yes 5 constraints, including the manual brew install dependency.
### Checklist Yes Standard 4-item checklist.
### Related Yes References #166, #172, and project-capacitor-mobile.

Assessment: Passes template compliance.

TRACEABILITY TRIANGLE

Leg Label Traced To Valid
User Story story:superuser-deploy project-pal-e-platform User Stories table: "I can deploy infrastructure changes via tofu plan/apply and see them succeed in Woodpecker CI without manual intervention." Yes
Architecture arch:ci-pipeline convention-architecture-ids Deployment Components table: arch:ci-pipeline = Woodpecker CI Yes
Type type:infra template-ticket Label Conventions: valid type value Yes

The arch:ci-pipeline label is a reasonable fit. The work touches Woodpecker agent infrastructure, which is the CI pipeline's execution layer. The Salt and observability aspects are enablers of the CI pipeline, not separate architecture components.

Assessment: Triangle traces correctly.

FILE TARGET SPECIFICITY

File targets are specific enough for agent execution:

  • Create paths: 6 files under salt/states/mac-agent/ and salt/pillar/mac-agent/ -- follows the existing Salt directory convention (verified: salt/states/ has kernel/, k3s/, nvidia/, packages/, etc. with init.sls pattern).
  • Modify paths: salt/pillar/top.sls (currently only targets archbox -- agent will add lucass-macbook-air-1 targeting), terraform/main.tf (Prometheus scrape config).
  • NOT-touch paths: Existing archbox states, Woodpecker Helm chart.

One gap: The "Grafana -- dashboard for Mac agent build metrics" file target is vague. Existing dashboards follow the pattern terraform/dashboards/{name}.json + a kubernetes_config_map_v1 resource in main.tf. The agent can discover this pattern from context (3 existing dashboards follow it), but specifying the exact path (terraform/dashboards/mac-agent-dashboard.json) would remove ambiguity.

ACCEPTANCE CRITERIA TESTABILITY

All 7 criteria are testable:

  1. Salt minion connects -- verifiable via salt 'lucass-macbook-air-1' test.ping
  2. state.apply mac-agent provisions everything -- verifiable via Salt
  3. Woodpecker agent visible in admin -- verifiable via Woodpecker UI
  4. Logs in Grafana/Loki -- verifiable via Grafana query (provided in Test Expectations)
  5. CPU/memory metrics in Grafana/Prometheus -- verifiable via PromQL query (provided)
  6. Blackbox probe alerts on unreachable -- verifiable by checking alert rules
  7. MagicDNS resolves -- verifiable via dig or nslookup

Assessment: All criteria are testable with specific verification methods.

DEPENDENCIES

Dependency Type Noted
#166 (Mac CI agent manual setup) Superseded Yes, in Lineage
#172 (gRPC funnel) Maybe-needed Yes, in Related ("may not be needed if MagicDNS works")
project-capacitor-mobile Consumer Yes, in Related
Manual brew install salt-minion External/human Yes, in Constraints

Assessment: Dependencies are documented and correct. The constraint about needing Lucas for the initial Salt install is honest and important.

MINOR OBSERVATIONS (non-blocking)

  1. Missing track:devops label on the board item. This is devops infrastructure work. Per template-ticket: "Not every ticket needs every label" -- so not required, but would improve filtering.

  2. Missing scope: label. This work originated from superseding #166 (manual setup). scope:planned or no scope label are both acceptable.

  3. Related section references project-capacitor-mobile (consumer) but does not explicitly reference project-pal-e-platform (the project this ticket belongs to). The Lineage section covers this, so it is traceable, but adding it to Related would be consistent with template-issue guidance ("project-slug -- project this affects").

  4. Grafana dashboard file target should be more specific: suggest terraform/dashboards/mac-agent-dashboard.json to match existing convention.

  5. No points assigned on the board item. Per template-ticket, points should be assigned during triage. This is a multi-file, multi-tool ticket (Salt + Terraform + Grafana) -- likely a 5 or 8.

SOP COMPLIANCE

  • Issue follows template-issue format
  • story: label traces to project page User Stories
  • arch: label traces to convention-architecture-ids
  • type: label is valid
  • Acceptance criteria are testable
  • File targets are specific (with one minor gap noted)
  • Dependencies documented
  • No secrets in issue body (Loki URL, agent token referenced as pillar items, not exposed)

VERDICT: APPROVED

Ticket is well-scoped and agent-executable. The four minor observations above (track label, scope label, Grafana path specificity, points) are quality-of-life improvements, not blockers. An agent can execute this ticket as written.

## Ticket Scope Review: #174 ### TEMPLATE COMPLIANCE (template-issue) The issue follows the `template-issue` format closely. Section-by-section: | Section | Present | Notes | |---------|---------|-------| | `### Type` | Yes (extra) | Not in the template, but useful metadata. No issue. | | `### Lineage` | Yes | References `project-pal-e-platform` and supersedes #166. Path 2 (board-driven), so no plan phase ancestry required. | | `### Repo` | Yes | `forgejo_admin/pal-e-platform` | | `### User Story` | Yes | Proper As/I want/So that format. | | `### Context` | Yes | Thorough. Three sub-deliverables clearly enumerated. | | `### File Targets` | Yes | Excellent -- create/modify/NOT-touch lists with specific paths. | | `### Acceptance Criteria` | Yes | 7 checkboxes, all testable. | | `### Test Expectations` | Yes | 4 checkboxes with specific commands/queries. | | `### Constraints` | Yes | 5 constraints, including the manual `brew install` dependency. | | `### Checklist` | Yes | Standard 4-item checklist. | | `### Related` | Yes | References #166, #172, and `project-capacitor-mobile`. | **Assessment: Passes template compliance.** ### TRACEABILITY TRIANGLE | Leg | Label | Traced To | Valid | |-----|-------|-----------|-------| | **User Story** | `story:superuser-deploy` | project-pal-e-platform User Stories table: "I can deploy infrastructure changes via `tofu plan/apply` and see them succeed in Woodpecker CI without manual intervention." | Yes | | **Architecture** | `arch:ci-pipeline` | convention-architecture-ids Deployment Components table: `arch:ci-pipeline` = Woodpecker CI | Yes | | **Type** | `type:infra` | template-ticket Label Conventions: valid type value | Yes | The `arch:ci-pipeline` label is a reasonable fit. The work touches Woodpecker agent infrastructure, which is the CI pipeline's execution layer. The Salt and observability aspects are enablers of the CI pipeline, not separate architecture components. **Assessment: Triangle traces correctly.** ### FILE TARGET SPECIFICITY File targets are specific enough for agent execution: - **Create paths**: 6 files under `salt/states/mac-agent/` and `salt/pillar/mac-agent/` -- follows the existing Salt directory convention (verified: `salt/states/` has `kernel/`, `k3s/`, `nvidia/`, `packages/`, etc. with `init.sls` pattern). - **Modify paths**: `salt/pillar/top.sls` (currently only targets `archbox` -- agent will add `lucass-macbook-air-1` targeting), `terraform/main.tf` (Prometheus scrape config). - **NOT-touch paths**: Existing archbox states, Woodpecker Helm chart. **One gap**: The "Grafana -- dashboard for Mac agent build metrics" file target is vague. Existing dashboards follow the pattern `terraform/dashboards/{name}.json` + a `kubernetes_config_map_v1` resource in `main.tf`. The agent can discover this pattern from context (3 existing dashboards follow it), but specifying the exact path (`terraform/dashboards/mac-agent-dashboard.json`) would remove ambiguity. ### ACCEPTANCE CRITERIA TESTABILITY All 7 criteria are testable: 1. Salt minion connects -- verifiable via `salt 'lucass-macbook-air-1' test.ping` 2. `state.apply mac-agent` provisions everything -- verifiable via Salt 3. Woodpecker agent visible in admin -- verifiable via Woodpecker UI 4. Logs in Grafana/Loki -- verifiable via Grafana query (provided in Test Expectations) 5. CPU/memory metrics in Grafana/Prometheus -- verifiable via PromQL query (provided) 6. Blackbox probe alerts on unreachable -- verifiable by checking alert rules 7. MagicDNS resolves -- verifiable via `dig` or `nslookup` **Assessment: All criteria are testable with specific verification methods.** ### DEPENDENCIES | Dependency | Type | Noted | |------------|------|-------| | #166 (Mac CI agent manual setup) | Superseded | Yes, in Lineage | | #172 (gRPC funnel) | Maybe-needed | Yes, in Related ("may not be needed if MagicDNS works") | | `project-capacitor-mobile` | Consumer | Yes, in Related | | Manual `brew install salt-minion` | External/human | Yes, in Constraints | **Assessment: Dependencies are documented and correct.** The constraint about needing Lucas for the initial Salt install is honest and important. ### MINOR OBSERVATIONS (non-blocking) 1. **Missing `track:devops` label** on the board item. This is devops infrastructure work. Per template-ticket: "Not every ticket needs every label" -- so not required, but would improve filtering. 2. **Missing `scope:` label**. This work originated from superseding #166 (manual setup). `scope:planned` or no scope label are both acceptable. 3. **Related section** references `project-capacitor-mobile` (consumer) but does not explicitly reference `project-pal-e-platform` (the project this ticket belongs to). The Lineage section covers this, so it is traceable, but adding it to Related would be consistent with `template-issue` guidance ("project-slug -- project this affects"). 4. **Grafana dashboard file target** should be more specific: suggest `terraform/dashboards/mac-agent-dashboard.json` to match existing convention. 5. **No points assigned** on the board item. Per template-ticket, points should be assigned during triage. This is a multi-file, multi-tool ticket (Salt + Terraform + Grafana) -- likely a 5 or 8. ### SOP COMPLIANCE - [x] Issue follows template-issue format - [x] story: label traces to project page User Stories - [x] arch: label traces to convention-architecture-ids - [x] type: label is valid - [x] Acceptance criteria are testable - [x] File targets are specific (with one minor gap noted) - [x] Dependencies documented - [x] No secrets in issue body (Loki URL, agent token referenced as pillar items, not exposed) ### VERDICT: APPROVED Ticket is well-scoped and agent-executable. The four minor observations above (track label, scope label, Grafana path specificity, points) are quality-of-life improvements, not blockers. An agent can execute this ticket as written.
Author
Owner

Progress Update (2026-03-26) — Salt minion connected

Completed

  • Salt 3006.23 LTS installed on Mac via official .pkg from GitHub releases (repo.saltproject.io was unreachable — CloudFront DNS issue)
  • Minion configured: master: 100.110.151.59, id: lucass-macbook-air-1
  • Key accepted on archbox: salt-key -a lucass-macbook-air-1
  • salt 'lucass-macbook-air-1' test.ping returns True

Infrastructure fixes discovered and applied

  1. Salt master interface binding — was 127.0.0.1 (localhost only), changed to 100.110.151.59 (Tailscale IP). Comment in config predicted this: "When scaling to multi-node, change this to the Tailscale IP."

  2. nftables stale interface index bug — firewall rule iif 5 was stale (Tailscale was index 5 at boot, now index 87 after restarts). Reloaded /etc/nftables.conf to get iif "tailscale0" (name-based). Root cause: nftables resolves interface names to numeric indexes at load time. Need a systemd dependency or timer to reload after Tailscale starts. Discovered scope.

  3. Local archbox minion — updated from master: 127.0.0.1 to master: 100.110.151.59 to match master's new bind address.

Woodpecker agent connection issue

  • Tailscale funnel (PR #173) doesn't work for gRPC — funnels proxy as HTTP/1.1, gRPC needs HTTP/2
  • Correct approach: use MagicDNS (woodpecker.tail5b443a.ts.net:9000) directly over Tailscale, not through a funnel
  • MagicDNS resolution needs verification on Mac
  • Agent plist needs updating: WOODPECKER_SERVER=woodpecker.tail5b443a.ts.net:9000, WOODPECKER_GRPC_SECURE=false

Key lessons

  • repo.saltproject.io is unreliable — use GitHub releases for .pkg downloads
  • nftables + Tailscale interface index drift — needs systemd reload-after-tailscale
  • Salt master binding must match minion config — when changing interface, update BOTH
  • Tailscale funnels are HTTP only — gRPC/TCP services use MagicDNS direct, not funnels

Remaining

  • Write Salt states for mac-agent (woodpecker, observability, homebrew)
  • Fix Woodpecker agent connection via MagicDNS
  • Add Promtail + node-exporter via Salt
  • Grafana dashboard for Mac agent
  • nftables reload-after-tailscale systemd fix (discovered scope)
## Progress Update (2026-03-26) — Salt minion connected ### Completed - Salt 3006.23 LTS installed on Mac via official .pkg from GitHub releases (repo.saltproject.io was unreachable — CloudFront DNS issue) - Minion configured: `master: 100.110.151.59`, `id: lucass-macbook-air-1` - Key accepted on archbox: `salt-key -a lucass-macbook-air-1` - **`salt 'lucass-macbook-air-1' test.ping` returns True** ### Infrastructure fixes discovered and applied 1. **Salt master interface binding** — was `127.0.0.1` (localhost only), changed to `100.110.151.59` (Tailscale IP). Comment in config predicted this: "When scaling to multi-node, change this to the Tailscale IP." 2. **nftables stale interface index bug** — firewall rule `iif 5` was stale (Tailscale was index 5 at boot, now index 87 after restarts). Reloaded `/etc/nftables.conf` to get `iif "tailscale0"` (name-based). **Root cause:** nftables resolves interface names to numeric indexes at load time. Need a systemd dependency or timer to reload after Tailscale starts. Discovered scope. 3. **Local archbox minion** — updated from `master: 127.0.0.1` to `master: 100.110.151.59` to match master's new bind address. ### Woodpecker agent connection issue - Tailscale funnel (PR #173) doesn't work for gRPC — funnels proxy as HTTP/1.1, gRPC needs HTTP/2 - Correct approach: use MagicDNS (`woodpecker.tail5b443a.ts.net:9000`) directly over Tailscale, not through a funnel - MagicDNS resolution needs verification on Mac - Agent plist needs updating: `WOODPECKER_SERVER=woodpecker.tail5b443a.ts.net:9000`, `WOODPECKER_GRPC_SECURE=false` ### Key lessons - **repo.saltproject.io is unreliable** — use GitHub releases for .pkg downloads - **nftables + Tailscale interface index drift** — needs systemd reload-after-tailscale - **Salt master binding must match minion config** — when changing interface, update BOTH - **Tailscale funnels are HTTP only** — gRPC/TCP services use MagicDNS direct, not funnels ### Remaining - [ ] Write Salt states for mac-agent (woodpecker, observability, homebrew) - [ ] Fix Woodpecker agent connection via MagicDNS - [ ] Add Promtail + node-exporter via Salt - [ ] Grafana dashboard for Mac agent - [ ] nftables reload-after-tailscale systemd fix (discovered scope)
Author
Owner

Scope Review: BLOCK (Phase 30 review)

Review note: review-287-2026-03-27

Phase 30 (Mac CI Agent) is well-scoped but hard-blocked by Apple Developer Program enrollment. Board item #287 correctly carries the blocked-by:apple-dev-enrollment label.

Key issues:

  • HARD BLOCK — 3 of 4 acceptance criteria require Apple Developer enrollment (Fastlane match, xcodebuild signing, TestFlight upload). Only agent registration can proceed without it.
  • Dual-tracking confusion — this issue (#174, board item #391 in next_up) and phase board item #287 (in todo) appear to describe the same work. One is "Mac build agent — Salt managed with observability", the other is "Phase 30: Mac CI Agent — iOS Build Infrastructure." Relationship needs clarification.
  • Salt scope gap — this issue's title says "Salt managed" but the phase note doesn't mention Salt at all. Meanwhile, untracked files salt/states/mac-agent/ and salt/pillar/mac-agent.sls already exist in the repo. The phase note should incorporate Salt config management.
  • Missing arch label — board item #287 should add arch:ci-pipeline for consistency with other CI items.
## Scope Review: BLOCK (Phase 30 review) Review note: `review-287-2026-03-27` Phase 30 (Mac CI Agent) is well-scoped but hard-blocked by Apple Developer Program enrollment. Board item #287 correctly carries the `blocked-by:apple-dev-enrollment` label. **Key issues:** - **HARD BLOCK** — 3 of 4 acceptance criteria require Apple Developer enrollment (Fastlane match, xcodebuild signing, TestFlight upload). Only agent registration can proceed without it. - **Dual-tracking confusion** — this issue (#174, board item #391 in next_up) and phase board item #287 (in todo) appear to describe the same work. One is "Mac build agent — Salt managed with observability", the other is "Phase 30: Mac CI Agent — iOS Build Infrastructure." Relationship needs clarification. - **Salt scope gap** — this issue's title says "Salt managed" but the phase note doesn't mention Salt at all. Meanwhile, untracked files `salt/states/mac-agent/` and `salt/pillar/mac-agent.sls` already exist in the repo. The phase note should incorporate Salt config management. - **Missing arch label** — board item #287 should add `arch:ci-pipeline` for consistency with other CI items.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
forgejo_admin/pal-e-platform#174
No description provided.