Spike: Monitoring stack MCP API surface -- Grafana, Prometheus, Alertmanager agent tools #214

New issue

Open

opened 2026-03-28 02:06:42 +00:00 by forgejo_admin · 0 comments

forgejo_admin commented

2026-03-28 02:06:42 +00:00

Owner

Type

Spike

Lineage

Standalone -- migrated from todo-monitoring-stack-mcp-api in pal-e-docs. Comprehensive API surface analysis already completed in the todo note; this spike scopes the actual implementation.

Repo

forgejo_admin/pal-e-platform (multiple repos if implementation proceeds)

Question

Which subset of Grafana, Prometheus, and Alertmanager API endpoints should an MCP server expose first, and what is the minimum viable implementation to enable agent self-diagnosis?

What to Explore

Review the 8 high-value endpoints identified in the todo note:
1. Prometheus /query -- run PromQL
2. Prometheus /targets -- scrape health
3. Prometheus /alerts -- active alerts
4. Prometheus /rules -- alert/recording rules
5. Grafana dashboard search -- find dashboards
6. Grafana dashboard get -- get dashboard JSON
7. Grafana datasource health -- verify datasource connectivity
8. Grafana alerting -- check alert states
Evaluate: is a custom MCP server needed, or can existing curl-based patterns in hooks suffice?
Check if any existing MCP servers (community or Anthropic) already wrap these APIs
Determine auth requirements (Grafana service account, Prometheus bearer token)
Estimate implementation effort for the minimum 8-endpoint MCP server

Success Criteria

Question answered: build custom MCP server vs use existing tools vs curl patterns
Auth requirements documented
Follow-up Feature issue created if custom MCP server is the answer
Or: "no action" conclusion if curl patterns in hooks are sufficient

Time-box

1 session (2 hours max). If time-box expires, document findings and escalate.

pal-e-platform -- project
plan-2026-02-25-platform-observability -- parent observability plan
plan-2026-03-01-dora-metrics-dashboard -- DORA dashboard uses these APIs

### Type Spike ### Lineage Standalone -- migrated from `todo-monitoring-stack-mcp-api` in pal-e-docs. Comprehensive API surface analysis already completed in the todo note; this spike scopes the actual implementation. ### Repo `forgejo_admin/pal-e-platform` (multiple repos if implementation proceeds) ### Question Which subset of Grafana, Prometheus, and Alertmanager API endpoints should an MCP server expose first, and what is the minimum viable implementation to enable agent self-diagnosis? ### What to Explore - Review the 8 high-value endpoints identified in the todo note: 1. Prometheus `/query` -- run PromQL 2. Prometheus `/targets` -- scrape health 3. Prometheus `/alerts` -- active alerts 4. Prometheus `/rules` -- alert/recording rules 5. Grafana dashboard search -- find dashboards 6. Grafana dashboard get -- get dashboard JSON 7. Grafana datasource health -- verify datasource connectivity 8. Grafana alerting -- check alert states - Evaluate: is a custom MCP server needed, or can existing curl-based patterns in hooks suffice? - Check if any existing MCP servers (community or Anthropic) already wrap these APIs - Determine auth requirements (Grafana service account, Prometheus bearer token) - Estimate implementation effort for the minimum 8-endpoint MCP server ### Success Criteria - [ ] Question answered: build custom MCP server vs use existing tools vs curl patterns - [ ] Auth requirements documented - [ ] Follow-up Feature issue created if custom MCP server is the answer - [ ] Or: "no action" conclusion if curl patterns in hooks are sufficient ### Time-box 1 session (2 hours max). If time-box expires, document findings and escalate. ### Related - `pal-e-platform` -- project - `plan-2026-02-25-platform-observability` -- parent observability plan - `plan-2026-03-01-dora-metrics-dashboard` -- DORA dashboard uses these APIs