feat: split k8s manifests and add /metrics endpoint #5

Merged
forgejo_admin merged 1 commit from 3-align-k8s-manifests-add-metrics into main 2026-03-28 11:27:18 +00:00
Contributor

Summary

Split the multi-document deployment.yaml into one-resource-per-file convention and add a hand-rolled Prometheus /metrics endpoint so the existing ServiceMonitor stops hitting 404.

Changes

  • k8s/deployment.yaml -- removed inline Service and PVC documents, now contains only the Deployment resource
  • k8s/service.yaml -- new file, extracted Service (port 8000, selector app: linkedin-scheduler-remote)
  • k8s/pvc.yaml -- new file, extracted PVC (linkedin-mcp-data, 100Mi, local-path)
  • k8s/kustomization.yaml -- added service.yaml and pvc.yaml to resources list (now lists all four: deployment, service, pvc, servicemonitor)
  • server.py -- added /metrics GET endpoint returning Prometheus text exposition format (uptime_seconds gauge, http_requests_total counter) with request-counting middleware

Test Plan

  • ruff check . -- passes
  • ruff format --check . -- passes
  • kubectl kustomize k8s/ -- renders four separate resources (Deployment, Service, PVC, ServiceMonitor)
  • curl http://localhost:8000/metrics -- returns Prometheus text format with uptime_seconds and http_requests_total
  • After ArgoCD sync, ServiceMonitor scrapes /metrics successfully (no more 404)

Review Checklist

  • No unrelated changes
  • ruff lint and format pass
  • kustomize build renders four separate resources
  • No new external dependencies added (hand-rolled Prometheus format)
  • Stale PR #4 should be closed (superseded by this PR)
  • Forgejo issue: #3

Closes #3

## Summary Split the multi-document `deployment.yaml` into one-resource-per-file convention and add a hand-rolled Prometheus `/metrics` endpoint so the existing ServiceMonitor stops hitting 404. ## Changes - `k8s/deployment.yaml` -- removed inline Service and PVC documents, now contains only the Deployment resource - `k8s/service.yaml` -- **new file**, extracted Service (port 8000, selector app: linkedin-scheduler-remote) - `k8s/pvc.yaml` -- **new file**, extracted PVC (linkedin-mcp-data, 100Mi, local-path) - `k8s/kustomization.yaml` -- added service.yaml and pvc.yaml to resources list (now lists all four: deployment, service, pvc, servicemonitor) - `server.py` -- added `/metrics` GET endpoint returning Prometheus text exposition format (uptime_seconds gauge, http_requests_total counter) with request-counting middleware ## Test Plan - `ruff check .` -- passes - `ruff format --check .` -- passes - `kubectl kustomize k8s/` -- renders four separate resources (Deployment, Service, PVC, ServiceMonitor) - `curl http://localhost:8000/metrics` -- returns Prometheus text format with uptime_seconds and http_requests_total - After ArgoCD sync, ServiceMonitor scrapes /metrics successfully (no more 404) ## Review Checklist - [x] No unrelated changes - [x] ruff lint and format pass - [x] kustomize build renders four separate resources - [x] No new external dependencies added (hand-rolled Prometheus format) - [ ] Stale PR #4 should be closed (superseded by this PR) ## Related Notes - Forgejo issue: #3 Closes #3
feat: split k8s manifests and add /metrics endpoint
All checks were successful
ci/woodpecker/pr/woodpecker Pipeline was successful
31950c89cd
Split multi-document deployment.yaml into one-resource-per-file
convention (deployment.yaml, service.yaml, pvc.yaml) and add a
hand-rolled Prometheus /metrics endpoint so ServiceMonitor stops
hitting 404.

Closes #3

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Author
Contributor

QA Review -- PR #5

Scope Check

PR modifies exactly the 5 files listed in issue #3's File Targets. No unrelated changes. No files from the "do not touch" list were modified (.woodpecker.yml, Dockerfile, servicemonitor.yaml, pyproject.toml).

k8s Manifests

deployment.yaml -- Clean. Only the Deployment resource remains. No --- separators, no inline Service or PVC.

service.yaml -- Correct extraction. Port name http matches the ServiceMonitor's port: http endpoint reference. Selector app: linkedin-scheduler-remote is consistent.

pvc.yaml -- Correct extraction. Name linkedin-mcp-data, 100Mi, local-path storageClass, ReadWriteOnce.

kustomization.yaml -- Lists all four resources: deployment.yaml, service.yaml, pvc.yaml, servicemonitor.yaml. Verified via kubectl kustomize . rendering four separate resources.

/metrics Endpoint

  • Prometheus text exposition format with correct content type (text/plain; version=0.0.4; charset=utf-8)
  • Two metrics exposed: uptime_seconds (gauge) and http_requests_total (counter) -- meets minimum requirement from acceptance criteria
  • HELP and TYPE annotations present per Prometheus convention
  • Thread-safe counter via threading.Lock
  • No external dependencies (hand-rolled as required by constraints)
  • Route inserted at position 0 to ensure it takes priority over MCP catch-all routes

Observations (non-blocking)

  1. _start_time set at module load -- uptime_seconds measures from import time rather than server start. The delta is negligible (sub-second) and this is a common pattern.
  2. Metrics endpoint counts itself -- each Prometheus scrape increments http_requests_total. This is standard behavior and expected.
  3. global keyword in middleware class -- functional and correct for the module-level variable pattern used here.

Acceptance Criteria Checklist

  • k8s/deployment.yaml contains only the Deployment resource
  • k8s/service.yaml exists with correct Service resource
  • k8s/pvc.yaml exists with correct PVC resource
  • k8s/kustomization.yaml lists all four resources
  • server.py exposes GET /metrics in Prometheus text format
  • No new external dependencies
  • ruff lint and format pass
  • kustomize build renders four separate resources

VERDICT: APPROVE

No blocking issues found. Clean extraction of k8s resources and a solid hand-rolled /metrics implementation. Ready to merge.

## QA Review -- PR #5 ### Scope Check PR modifies exactly the 5 files listed in issue #3's File Targets. No unrelated changes. No files from the "do not touch" list were modified (.woodpecker.yml, Dockerfile, servicemonitor.yaml, pyproject.toml). ### k8s Manifests **deployment.yaml** -- Clean. Only the Deployment resource remains. No `---` separators, no inline Service or PVC. **service.yaml** -- Correct extraction. Port name `http` matches the ServiceMonitor's `port: http` endpoint reference. Selector `app: linkedin-scheduler-remote` is consistent. **pvc.yaml** -- Correct extraction. Name `linkedin-mcp-data`, 100Mi, `local-path` storageClass, ReadWriteOnce. **kustomization.yaml** -- Lists all four resources: deployment.yaml, service.yaml, pvc.yaml, servicemonitor.yaml. Verified via `kubectl kustomize .` rendering four separate resources. ### /metrics Endpoint - Prometheus text exposition format with correct content type (`text/plain; version=0.0.4; charset=utf-8`) - Two metrics exposed: `uptime_seconds` (gauge) and `http_requests_total` (counter) -- meets minimum requirement from acceptance criteria - HELP and TYPE annotations present per Prometheus convention - Thread-safe counter via `threading.Lock` - No external dependencies (hand-rolled as required by constraints) - Route inserted at position 0 to ensure it takes priority over MCP catch-all routes ### Observations (non-blocking) 1. **`_start_time` set at module load** -- uptime_seconds measures from import time rather than server start. The delta is negligible (sub-second) and this is a common pattern. 2. **Metrics endpoint counts itself** -- each Prometheus scrape increments `http_requests_total`. This is standard behavior and expected. 3. **`global` keyword in middleware class** -- functional and correct for the module-level variable pattern used here. ### Acceptance Criteria Checklist - [x] `k8s/deployment.yaml` contains only the Deployment resource - [x] `k8s/service.yaml` exists with correct Service resource - [x] `k8s/pvc.yaml` exists with correct PVC resource - [x] `k8s/kustomization.yaml` lists all four resources - [x] `server.py` exposes GET /metrics in Prometheus text format - [x] No new external dependencies - [x] ruff lint and format pass - [x] kustomize build renders four separate resources **VERDICT: APPROVE** No blocking issues found. Clean extraction of k8s resources and a solid hand-rolled /metrics implementation. Ready to merge.
Author
Contributor

PR #5 Review

DOMAIN REVIEW

Tech stack: Python (Starlette/uvicorn), Kubernetes manifests (Kustomize), Prometheus metrics.

K8s manifest split (k8s/)

The extraction of Service and PVC from the multi-document deployment.yaml into k8s/service.yaml and k8s/pvc.yaml is byte-for-byte identical to the originals. kustomization.yaml correctly lists all four resources. Clean, mechanical refactor -- no issues.

Prometheus /metrics endpoint (server.py)

  • Metric names follow Prometheus conventions correctly: uptime_seconds (gauge with _seconds suffix), http_requests_total (counter with _total suffix).
  • # HELP and # TYPE annotations are well-formed.
  • Content type text/plain; version=0.0.4; charset=utf-8 is the correct Prometheus text exposition format.
  • _start_time = time.monotonic() is the right clock choice (monotonic, not wall-clock).
  • Thread lock on _http_request_count is correctly used for both reads and writes -- safe given the BaseHTTPMiddleware async dispatch runs in a thread context.
  • app.routes.insert(0, Route(...)) ensures /metrics takes precedence over any catch-all MCP routes. Direct list mutation is slightly fragile but appropriate here since build_app_with_middleware returns a fresh Starlette app each time.
  • Starlette is a transitive dependency via mcp/uvicorn -- not explicitly declared in pyproject.toml but reliably present. Acceptable for an internal service.

BLOCKERS

None.

On test coverage: This PR adds ~30 lines of new Python (metrics endpoint + counting middleware). The repo has zero test files and CI only runs ruff lint/format -- that is a pre-existing condition, not a regression introduced by this PR. The new code is infrastructure plumbing (Prometheus text format output and a request counter), not business logic. Requiring test coverage for this PR alone when the entire repo is untested would be scope creep. That said, a follow-up issue to establish a basic test harness for this repo would be valuable.

NITS

  1. Metrics endpoint counts its own scrapes. The _CountMiddleware increments _http_request_count for every request including GET /metrics. With a 30s ServiceMonitor scrape interval, this adds ~2880 phantom requests/day to the counter. Consider excluding the /metrics path from the counter:

    if request.url.path != "/metrics":
        with _http_request_count_lock:
            _http_request_count += 1
    
  2. Missing trailing newline in metrics output. The Prometheus exposition format spec requires the response to end with a newline. The current lines list ends with "" which produces a trailing \n via "\n".join(lines) -- so this actually IS correct. Disregard.

  3. [project.urls] still points to GitHub. In pyproject.toml lines 34-36, Repository and Issues URLs point to github.com rather than the Forgejo instance. Pre-existing, not introduced by this PR, but worth noting for a future cleanup.

SOP COMPLIANCE

  • Branch named after issue (3-align-k8s-manifests-add-metrics references issue #3)
  • PR body has Summary, Changes, Test Plan, Related sections
  • Related section references issue #3, Closes #3 present
  • No secrets or credentials committed
  • No .env files in diff
  • No unrelated changes -- all changes serve the two goals stated in the title
  • Commit messages are descriptive (single commit PR)
  • PR body template does not include a ## Related section referencing a plan slug -- acceptable since this is a standalone repo without an associated pal-e-docs plan

PROCESS OBSERVATIONS

  • Stale PR #4 is noted in the PR checklist as needing closure. Confirm it is superseded and close it to keep the PR list clean.
  • Test harness gap: This repo has zero tests and CI only lints. Consider a follow-up issue to add a minimal test harness (at least for the /metrics endpoint, which is easy to test in isolation without LinkedIn OAuth).
  • Deployment frequency: This is a small, focused PR that addresses a real 404 from the ServiceMonitor. Good size for fast review-merge-deploy cycle.

VERDICT: APPROVED

## PR #5 Review ### DOMAIN REVIEW **Tech stack:** Python (Starlette/uvicorn), Kubernetes manifests (Kustomize), Prometheus metrics. **K8s manifest split (k8s/)** The extraction of Service and PVC from the multi-document `deployment.yaml` into `k8s/service.yaml` and `k8s/pvc.yaml` is byte-for-byte identical to the originals. `kustomization.yaml` correctly lists all four resources. Clean, mechanical refactor -- no issues. **Prometheus /metrics endpoint (server.py)** - Metric names follow Prometheus conventions correctly: `uptime_seconds` (gauge with `_seconds` suffix), `http_requests_total` (counter with `_total` suffix). - `# HELP` and `# TYPE` annotations are well-formed. - Content type `text/plain; version=0.0.4; charset=utf-8` is the correct Prometheus text exposition format. - `_start_time = time.monotonic()` is the right clock choice (monotonic, not wall-clock). - Thread lock on `_http_request_count` is correctly used for both reads and writes -- safe given the `BaseHTTPMiddleware` async dispatch runs in a thread context. - `app.routes.insert(0, Route(...))` ensures `/metrics` takes precedence over any catch-all MCP routes. Direct list mutation is slightly fragile but appropriate here since `build_app_with_middleware` returns a fresh Starlette app each time. - Starlette is a transitive dependency via `mcp`/`uvicorn` -- not explicitly declared in `pyproject.toml` but reliably present. Acceptable for an internal service. ### BLOCKERS None. **On test coverage:** This PR adds ~30 lines of new Python (metrics endpoint + counting middleware). The repo has zero test files and CI only runs ruff lint/format -- that is a pre-existing condition, not a regression introduced by this PR. The new code is infrastructure plumbing (Prometheus text format output and a request counter), not business logic. Requiring test coverage for this PR alone when the entire repo is untested would be scope creep. That said, a follow-up issue to establish a basic test harness for this repo would be valuable. ### NITS 1. **Metrics endpoint counts its own scrapes.** The `_CountMiddleware` increments `_http_request_count` for every request including `GET /metrics`. With a 30s ServiceMonitor scrape interval, this adds ~2880 phantom requests/day to the counter. Consider excluding the `/metrics` path from the counter: ```python if request.url.path != "/metrics": with _http_request_count_lock: _http_request_count += 1 ``` 2. **Missing trailing newline in metrics output.** The Prometheus exposition format spec requires the response to end with a newline. The current `lines` list ends with `""` which produces a trailing `\n` via `"\n".join(lines)` -- so this actually IS correct. Disregard. 3. **`[project.urls]` still points to GitHub.** In `pyproject.toml` lines 34-36, Repository and Issues URLs point to `github.com` rather than the Forgejo instance. Pre-existing, not introduced by this PR, but worth noting for a future cleanup. ### SOP COMPLIANCE - [x] Branch named after issue (`3-align-k8s-manifests-add-metrics` references issue #3) - [x] PR body has Summary, Changes, Test Plan, Related sections - [x] Related section references issue #3, `Closes #3` present - [x] No secrets or credentials committed - [x] No `.env` files in diff - [x] No unrelated changes -- all changes serve the two goals stated in the title - [x] Commit messages are descriptive (single commit PR) - [ ] PR body template does not include a `## Related` section referencing a plan slug -- acceptable since this is a standalone repo without an associated pal-e-docs plan ### PROCESS OBSERVATIONS - **Stale PR #4** is noted in the PR checklist as needing closure. Confirm it is superseded and close it to keep the PR list clean. - **Test harness gap:** This repo has zero tests and CI only lints. Consider a follow-up issue to add a minimal test harness (at least for the `/metrics` endpoint, which is easy to test in isolation without LinkedIn OAuth). - **Deployment frequency:** This is a small, focused PR that addresses a real 404 from the ServiceMonitor. Good size for fast review-merge-deploy cycle. ### VERDICT: APPROVED
forgejo_admin deleted branch 3-align-k8s-manifests-add-metrics 2026-03-28 11:27:18 +00:00
Sign in to join this conversation.
No description provided.