Add MCP server health check SessionStart hook #88

Merged
forgejo_admin merged 2 commits from 76-check-mcp-servers into main 2026-03-14 14:09:26 +00:00
Contributor

Summary

  • Adds check-mcp-servers.sh SessionStart hook that detects MCP servers configured in ~/.mcp.json that failed to initialize silently
  • Fail-open design: warns via additionalContext but never blocks session startup

Changes

  • hooks/check-mcp-servers.sh: New 177-line hook. Snapshots ps aux, builds unique fingerprints per server type (uv --directory, npx package name, node script path), reports missing servers
  • settings.json: Registers hook as 4th SessionStart entry

Test Plan

  • Hook fires on every session start (already live via hardlinks)
  • No false positives when all servers are running
  • Warning appears when a server is genuinely missing
  • Graceful exit when jq is not installed or ~/.mcp.json is absent

Review Checklist

  • No secrets committed
  • No unnecessary file changes
  • Commit messages are descriptive
  • Closes #76
  • plan-pal-e-agency — Phase 8 (MCP reliability) related work

🤖 Generated with Claude Code

## Summary - Adds `check-mcp-servers.sh` SessionStart hook that detects MCP servers configured in `~/.mcp.json` that failed to initialize silently - Fail-open design: warns via `additionalContext` but never blocks session startup ## Changes - `hooks/check-mcp-servers.sh`: New 177-line hook. Snapshots `ps aux`, builds unique fingerprints per server type (uv `--directory`, npx package name, node script path), reports missing servers - `settings.json`: Registers hook as 4th SessionStart entry ## Test Plan - [x] Hook fires on every session start (already live via hardlinks) - [x] No false positives when all servers are running - [x] Warning appears when a server is genuinely missing - [x] Graceful exit when `jq` is not installed or `~/.mcp.json` is absent ## Review Checklist - [x] No secrets committed - [x] No unnecessary file changes - [x] Commit messages are descriptive ## Related - Closes #76 - `plan-pal-e-agency` — Phase 8 (MCP reliability) related work 🤖 Generated with [Claude Code](https://claude.com/claude-code)
Detects MCP servers configured in ~/.mcp.json that failed to initialize
by checking for matching processes. Fail-open design — warns but never
blocks session startup.

Closes #76

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Author
Contributor

PR #88 Review

BLOCKERS

1. Scoped npm package fingerprint is broken (line 151)

The npx fingerprint logic strips the @version suffix using bash parameter expansion:

echo "${package%%@*}"

For a scoped npm package like @playwright/mcp@latest, the %%@* operator removes the longest suffix matching @*. Since the string starts with @, the entire string matches, yielding an empty string. The empty string is then caught by the [[ -n "$package" ]] guard on the pre-stripped value, but the stripped result is what gets echoed. The [[ -z "$fingerprint" ]] check on line 55 would then catch it and continue, so the server is silently skipped rather than flagged as missing.

Net effect: all scoped npm packages (@org/package) silently skip fingerprint matching, so they are never checked and never warned about. For the current config, @playwright/mcp@latest is the only npx server, meaning the npx code path is effectively dead.

Fix options:

  • Use a regex or sed to strip only a trailing @version: echo "$package" | sed 's/@[^/]*$//'
  • Or for scoped packages, strip only the last @: echo "${package%@*}" (shortest suffix match with % instead of %%)

2. False positives for intentionally disabled servers

The hook reads all keys from ~/.mcp.json but does not account for enabledMcpjsonServers in settings.json. In the current config:

  • ~/.mcp.json defines: playwright, notion, pal-e-docs, forgejo, woodpecker
  • settings.json enables: agent-manager, mcp-nvim, ts-tools, nvim-helper, ableton, playwright, forgejo

Only playwright and forgejo appear in both. The hook will check notion, pal-e-docs, and woodpecker, find no running processes (because Claude Code never started them), and report them as missing every single session. This makes the warning useless -- users will learn to ignore it.

The hook needs to either:

  • Parse settings.json and intersect with enabledMcpjsonServers (preferred -- accurate)
  • Or accept a list of servers to skip via env var (simpler but fragile)
  • Or at minimum document this limitation prominently so users know to keep .mcp.json in sync with enabledMcpjsonServers

NITS

1. set -euo pipefail vs fail-open tension (line 17)

The script uses set -euo pipefail but the intent is fail-open. The main 2>/dev/null || exit 0 wrapper on line 176 handles this, and the comment on line 40-42 explains the SIGPIPE rationale well. However, set -u (nounset) could cause unexpected exits if any variable is unintentionally unset inside build_fingerprint. The current code appears safe, but this is a latent fragility. Consider whether set -eo pipefail (without -u) would be more defensive for a fail-open hook.

2. The cat > /dev/null stdin drain (line 32)

Good practice for SessionStart hooks that don't need stdin. Consistent with other hooks in this repo.

3. Temp file in /tmp with predictable prefix (line 42)

mktemp /tmp/check-mcp-ps.XXXXXX uses XXXXXX which gives sufficient randomness. The trap cleanup EXIT on line 24 handles removal. No security concern here since the file contains only ps aux output, not secrets.

4. The jq to_entries index-offset pattern (lines 117-123, 132-138)

The jq logic for extracting --directory and -m values works correctly: it finds the index of the flag via to_entries, then reads $args[$idx + 1]. Verified against the actual ~/.mcp.json configs -- for notion with args ["run", "--directory", "/home/ldraney/notion-mcp", "python", "-m", "notion_mcp"], this correctly extracts /home/ldraney/notion-mcp. The || true fallbacks are appropriate.

5. Generic fallback case (lines 164-168)

The * case in build_fingerprint uses the full command path as fingerprint. This is reasonable -- if someone has a custom binary MCP server, the command path itself is a decent fingerprint. No issue here.

6. ps aux snapshot includes all users (line 43)

Using ps aux means the hook checks all system processes, not just the current user's. This is fine -- MCP servers run under the current user anyway, and grep -qF with a directory path is specific enough to avoid false positives from other users' processes.

SOP COMPLIANCE

  • Branch named after issue (76-check-mcp-servers references issue #76)
  • PR body has: Summary, Changes, Test Plan, Related
  • Related references plan slug -- references plan-pal-e-agency but as general context, not a specific phase deliverable. Acceptable but could be more precise.
  • Closes #76 present in PR body
  • No secrets committed (the hook itself handles no secrets; ~/.mcp.json is only read at runtime)
  • No unnecessary file changes (2 files: the hook + settings.json registration)
  • Commit messages -- single commit, descriptive title
  • Fail-open pattern follows established convention (matches session-start-context.sh and cleanup-worktrees.sh)

VERDICT: NOT APPROVED

Two blockers must be addressed:

  1. The scoped npm package fingerprint bug renders the npx code path non-functional. This needs a fix and ideally a test against the actual @playwright/mcp@latest config.

  2. The false positive issue with enabledMcpjsonServers will cause the hook to cry wolf every session, which defeats its purpose. The hook must account for which servers are actually enabled, or it will train users to ignore the warning.

## PR #88 Review ### BLOCKERS **1. Scoped npm package fingerprint is broken (line 151)** The `npx` fingerprint logic strips the `@version` suffix using bash parameter expansion: ```bash echo "${package%%@*}" ``` For a scoped npm package like `@playwright/mcp@latest`, the `%%@*` operator removes the **longest** suffix matching `@*`. Since the string starts with `@`, the entire string matches, yielding an empty string. The empty string is then caught by the `[[ -n "$package" ]]` guard on the *pre-stripped* value, but the stripped result is what gets echoed. The `[[ -z "$fingerprint" ]]` check on line 55 would then catch it and `continue`, so the server is silently skipped rather than flagged as missing. Net effect: **all scoped npm packages (`@org/package`) silently skip fingerprint matching**, so they are never checked and never warned about. For the current config, `@playwright/mcp@latest` is the only npx server, meaning the npx code path is effectively dead. Fix options: - Use a regex or sed to strip only a trailing `@version`: `echo "$package" | sed 's/@[^/]*$//'` - Or for scoped packages, strip only the last `@`: `echo "${package%@*}"` (shortest suffix match with `%` instead of `%%`) **2. False positives for intentionally disabled servers** The hook reads all keys from `~/.mcp.json` but does not account for `enabledMcpjsonServers` in `settings.json`. In the current config: - `~/.mcp.json` defines: `playwright`, `notion`, `pal-e-docs`, `forgejo`, `woodpecker` - `settings.json` enables: `agent-manager`, `mcp-nvim`, `ts-tools`, `nvim-helper`, `ableton`, `playwright`, `forgejo` Only `playwright` and `forgejo` appear in both. The hook will check `notion`, `pal-e-docs`, and `woodpecker`, find no running processes (because Claude Code never started them), and report them as missing every single session. This makes the warning useless -- users will learn to ignore it. The hook needs to either: - Parse `settings.json` and intersect with `enabledMcpjsonServers` (preferred -- accurate) - Or accept a list of servers to skip via env var (simpler but fragile) - Or at minimum document this limitation prominently so users know to keep `.mcp.json` in sync with `enabledMcpjsonServers` ### NITS **1. `set -euo pipefail` vs fail-open tension (line 17)** The script uses `set -euo pipefail` but the intent is fail-open. The `main 2>/dev/null || exit 0` wrapper on line 176 handles this, and the comment on line 40-42 explains the SIGPIPE rationale well. However, `set -u` (nounset) could cause unexpected exits if any variable is unintentionally unset inside `build_fingerprint`. The current code appears safe, but this is a latent fragility. Consider whether `set -eo pipefail` (without `-u`) would be more defensive for a fail-open hook. **2. The `cat > /dev/null` stdin drain (line 32)** Good practice for SessionStart hooks that don't need stdin. Consistent with other hooks in this repo. **3. Temp file in `/tmp` with predictable prefix (line 42)** `mktemp /tmp/check-mcp-ps.XXXXXX` uses `XXXXXX` which gives sufficient randomness. The `trap cleanup EXIT` on line 24 handles removal. No security concern here since the file contains only `ps aux` output, not secrets. **4. The jq `to_entries` index-offset pattern (lines 117-123, 132-138)** The jq logic for extracting `--directory` and `-m` values works correctly: it finds the index of the flag via `to_entries`, then reads `$args[$idx + 1]`. Verified against the actual `~/.mcp.json` configs -- for `notion` with args `["run", "--directory", "/home/ldraney/notion-mcp", "python", "-m", "notion_mcp"]`, this correctly extracts `/home/ldraney/notion-mcp`. The `|| true` fallbacks are appropriate. **5. Generic fallback case (lines 164-168)** The `*` case in `build_fingerprint` uses the full command path as fingerprint. This is reasonable -- if someone has a custom binary MCP server, the command path itself is a decent fingerprint. No issue here. **6. `ps aux` snapshot includes all users (line 43)** Using `ps aux` means the hook checks all system processes, not just the current user's. This is fine -- MCP servers run under the current user anyway, and `grep -qF` with a directory path is specific enough to avoid false positives from other users' processes. ### SOP COMPLIANCE - [x] Branch named after issue (`76-check-mcp-servers` references issue #76) - [x] PR body has: Summary, Changes, Test Plan, Related - [ ] Related references plan slug -- references `plan-pal-e-agency` but as general context, not a specific phase deliverable. Acceptable but could be more precise. - [x] `Closes #76` present in PR body - [x] No secrets committed (the hook itself handles no secrets; `~/.mcp.json` is only read at runtime) - [x] No unnecessary file changes (2 files: the hook + settings.json registration) - [x] Commit messages -- single commit, descriptive title - [x] Fail-open pattern follows established convention (matches `session-start-context.sh` and `cleanup-worktrees.sh`) ### VERDICT: NOT APPROVED Two blockers must be addressed: 1. The scoped npm package fingerprint bug renders the npx code path non-functional. This needs a fix and ideally a test against the actual `@playwright/mcp@latest` config. 2. The false positive issue with `enabledMcpjsonServers` will cause the hook to cry wolf every session, which defeats its purpose. The hook must account for which servers are actually enabled, or it will train users to ignore the warning.
${package%%@*} (longest suffix match) destroys scoped package names like
@playwright/mcp@latest → empty string. Changed to ${package%@*} (shortest
suffix match) which correctly yields @playwright/mcp.

Also investigated QA finding re: enabledMcpjsonServers filtering — verified
empirically that all ~/.mcp.json servers are started regardless of that
setting (all 5 servers running, only 2 listed in enabledMcpjsonServers).
No filter needed.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Author
Contributor

Nit fixes — 118f16a

Finding 1 (scoped package fingerprint) — FIXED

Changed ${package%%@*}${package%@*} (shortest suffix match). This correctly preserves @playwright/mcp from @playwright/mcp@latest.

Finding 2 (enabledMcpjsonServers false positives) — REJECTED

Verified empirically that enabledMcpjsonServers does NOT control which ~/.mcp.json servers are started. Evidence:

  • ~/.mcp.json has 5 servers: forgejo, notion, pal-e-docs, playwright, woodpecker
  • enabledMcpjsonServers only lists playwright and forgejo (plus 5 entries not in .mcp.json)
  • ps aux confirms ALL 5 servers are running (multiple instances across sessions)
  • The hook produced zero false positives at session start

The enabledMcpjsonServers field appears to serve a different purpose (possibly UI toggles or a separate config source). No filter is needed.

Ready for re-review.

## Nit fixes — 118f16a ### Finding 1 (scoped package fingerprint) — FIXED Changed `${package%%@*}` → `${package%@*}` (shortest suffix match). This correctly preserves `@playwright/mcp` from `@playwright/mcp@latest`. ### Finding 2 (enabledMcpjsonServers false positives) — REJECTED Verified empirically that `enabledMcpjsonServers` does NOT control which `~/.mcp.json` servers are started. Evidence: - `~/.mcp.json` has 5 servers: `forgejo`, `notion`, `pal-e-docs`, `playwright`, `woodpecker` - `enabledMcpjsonServers` only lists `playwright` and `forgejo` (plus 5 entries not in `.mcp.json`) - `ps aux` confirms ALL 5 servers are running (multiple instances across sessions) - The hook produced zero false positives at session start The `enabledMcpjsonServers` field appears to serve a different purpose (possibly UI toggles or a separate config source). No filter is needed. Ready for re-review.
Author
Contributor

PR #88 Re-Review

Re-review after nit fixes from first review round.

Finding 1: Scoped npm package fingerprint (% vs %%) -- VERIFIED FIXED

The fix on line 153 changing ${package%%@*} to ${package%@*} is correct.

Trace through the cases:

Input %%@* (old, longest suffix) %@* (new, shortest suffix)
@playwright/mcp@latest empty string (BUG) @playwright/mcp (correct)
some-package@1.0.0 some-package (correct) some-package (correct)
some-package (no version) some-package (correct) some-package (correct)

The %% operator was stripping from the FIRST @ in scoped packages, destroying the scope prefix entirely. The % operator correctly strips from the LAST @ only. The inline comment on lines 150-152 accurately explains the reasoning.

Finding 2: enabledMcpjsonServers false positives -- ACCEPTED REJECTION

I reviewed both config files:

  • settings.json (line 13-21): enabledMcpjsonServers lists playwright, forgejo, plus 5 servers NOT in .mcp.json (agent-manager, mcp-nvim, ts-tools, nvim-helper, ableton).
  • settings.local.json (line 93-99): Has enableAllProjectMcpServers: true and its own enabledMcpjsonServers listing playwright, notion, pal-e-docs, forgejo.

The developer provided ps aux evidence that all 5 .mcp.json servers are running. The hook iterates .mcp.json keys (line 36), not enabledMcpjsonServers, so servers absent from the allowlist but present in .mcp.json are checked -- and they ARE running. Additionally, the fail-open design means even if Claude Code changes enabledMcpjsonServers behavior in the future, the worst outcome is a warning, not a block.

The original concern was theoretical. The evidence refutes it. Rejection accepted.

BLOCKERS

None.

NITS

Latent edge case with scoped packages lacking a version suffix. If .mcp.json ever contained an npx arg like @scope/package (no @version), then ${package%@*} would match the scope @ and yield an empty string. That empty string would pass through echo, and grep -qF "" "$PS_FILE" matches every line -- resulting in a false negative (server never flagged as missing). The current .mcp.json uses @playwright/mcp@latest (always has @version), so this does not apply today. If scoped packages without version suffixes are ever added, the fingerprint logic would need a guard (e.g., check if the result still starts with @ and contains /). Non-blocking -- documenting for future awareness.

SOP COMPLIANCE

  • Branch named after issue (76-check-mcp-servers references issue #76)
  • PR body has: Summary, Changes, Test Plan, Related
  • Related section references plan slug (plan-pal-e-agency)
  • PR body includes Closes #76
  • No secrets, .env files, or credentials committed
  • No unnecessary file changes (2 files: new hook + settings.json registration)
  • Commit messages are descriptive
  • Fail-open design -- hook cannot block sessions

VERDICT: APPROVED

## PR #88 Re-Review Re-review after nit fixes from first review round. ### Finding 1: Scoped npm package fingerprint (`%` vs `%%`) -- VERIFIED FIXED The fix on line 153 changing `${package%%@*}` to `${package%@*}` is correct. Trace through the cases: | Input | `%%@*` (old, longest suffix) | `%@*` (new, shortest suffix) | |---|---|---| | `@playwright/mcp@latest` | empty string (BUG) | `@playwright/mcp` (correct) | | `some-package@1.0.0` | `some-package` (correct) | `some-package` (correct) | | `some-package` (no version) | `some-package` (correct) | `some-package` (correct) | The `%%` operator was stripping from the FIRST `@` in scoped packages, destroying the scope prefix entirely. The `%` operator correctly strips from the LAST `@` only. The inline comment on lines 150-152 accurately explains the reasoning. ### Finding 2: `enabledMcpjsonServers` false positives -- ACCEPTED REJECTION I reviewed both config files: - `settings.json` (line 13-21): `enabledMcpjsonServers` lists `playwright`, `forgejo`, plus 5 servers NOT in `.mcp.json` (`agent-manager`, `mcp-nvim`, `ts-tools`, `nvim-helper`, `ableton`). - `settings.local.json` (line 93-99): Has `enableAllProjectMcpServers: true` and its own `enabledMcpjsonServers` listing `playwright`, `notion`, `pal-e-docs`, `forgejo`. The developer provided `ps aux` evidence that all 5 `.mcp.json` servers are running. The hook iterates `.mcp.json` keys (line 36), not `enabledMcpjsonServers`, so servers absent from the allowlist but present in `.mcp.json` are checked -- and they ARE running. Additionally, the fail-open design means even if Claude Code changes `enabledMcpjsonServers` behavior in the future, the worst outcome is a warning, not a block. The original concern was theoretical. The evidence refutes it. Rejection accepted. ### BLOCKERS None. ### NITS **Latent edge case with scoped packages lacking a version suffix.** If `.mcp.json` ever contained an npx arg like `@scope/package` (no `@version`), then `${package%@*}` would match the scope `@` and yield an empty string. That empty string would pass through `echo`, and `grep -qF "" "$PS_FILE"` matches every line -- resulting in a false negative (server never flagged as missing). The current `.mcp.json` uses `@playwright/mcp@latest` (always has `@version`), so this does not apply today. If scoped packages without version suffixes are ever added, the fingerprint logic would need a guard (e.g., check if the result still starts with `@` and contains `/`). Non-blocking -- documenting for future awareness. ### SOP COMPLIANCE - [x] Branch named after issue (`76-check-mcp-servers` references issue #76) - [x] PR body has: Summary, Changes, Test Plan, Related - [x] Related section references plan slug (`plan-pal-e-agency`) - [x] PR body includes `Closes #76` - [x] No secrets, .env files, or credentials committed - [x] No unnecessary file changes (2 files: new hook + settings.json registration) - [x] Commit messages are descriptive - [x] Fail-open design -- hook cannot block sessions ### VERDICT: APPROVED
forgejo_admin deleted branch 76-check-mcp-servers 2026-03-14 14:09:26 +00:00
Sign in to join this conversation.
No description provided.