Add observability: client-side errors, structured logs, error subscriber (#39)

ldraney commented

2026-05-29 11:21:16 +00:00

Author

Owner

PR #40 Review

DOMAIN REVIEW

Tech stack: Rails 8, Stimulus/importmap, RSpec. This is a Ruby/Rails review with JS (vanilla Stimulus, no TypeScript).

ClientErrorsController (app/controllers/client_errors_controller.rb)

skip_forgery_protection is justified -- the client-side JS sends the CSRF token via X-CSRF-Token header, and the controller reads raw body. This is a standard pattern for JSON API endpoints in Rails. No issue.
Body size limit (4KB) is good -- prevents log-flooding abuse. The request.body.read(4096) approach is correct.
No rate limiting -- This endpoint is unauthenticated (no auth in ApplicationController) and CSRF-protected only by a meta tag token. An attacker with a valid CSRF token (which is in every page's HTML) could spam this endpoint to flood logs. This is a low-severity concern for an internal landscaping tool, but worth noting. Not a blocker given the app has no authentication at all.
data['stack']&.truncate(500) -- truncate is an ActiveSupport method on String. If data['stack'] is present but not a String (e.g., attacker sends "stack": 123), this will raise NoMethodError. The JSON::ParserError rescue won't catch it. This would result in a 500 error, which is noisy but not harmful. Minor robustness gap.
No sanitization of logged values -- data['message'], data['source'], etc. are interpolated directly into log output. If logs are consumed by a log aggregator (Loki, as mentioned), injected newlines or control characters could corrupt log parsing. Low risk for a structured logging pipeline but worth a note.

Error subscriber (config/initializers/error_subscriber.rb)

Clean implementation. The report method signature matches the Rails 7+ ErrorReporter::Subscriber interface correctly (error, handled:, severity:, context:, source:). Uses keyword args properly.

Lograge config (config/environments/production.rb)

Lograge config is production-only, which is correct. The request_id custom option aligns with the existing config.log_tags = [ :request_id ] on line 36. Good correlation setup.

Client-side JS (app/javascript/application.js)

The reportClientError function has .catch(() => {}) -- a silent catch on the error-reporting fetch itself. This is actually correct here (you don't want error-reporting failures to themselves trigger error reporting in an infinite loop). Good defensive pattern.
window.onerror returns false, which is correct -- it allows the error to still propagate to the browser console.
CSRF token lookup uses optional chaining (?.content), and the function returns early if no token is found. This handles SSR/test environments cleanly.

Stimulus controller fixes (location_controller.js, add_location_controller.js)

All four .catch() blocks now log to console.error and display actual error messages to the user. This is the right fix -- errors were previously swallowed with generic "Failed" messages. The user now sees e.message || "fallback" which is a good pattern.
Minor UX concern: Raw e.message values from fetch/network errors may not be user-friendly (e.g., "Failed to fetch", "NetworkError when attempting to fetch resource"). These are browser-generated strings and are acceptable for a dev/internal tool.

Test coverage (spec/requests/client_errors_spec.rb)

Two tests: valid payload (204 + log assertion) and invalid JSON (422). Both are correct and meaningful. The log assertion uses expect(Rails.logger).to receive(:error).with(/CLIENT_ERROR.*TypeError/) which verifies the log format contract.
Missing test: oversized payload -- The controller reads only 4096 bytes. There is no test verifying behavior when a payload exceeds that limit. What happens: body is truncated, JSON.parse likely raises JSON::ParserError on incomplete JSON, returning 422. This is acceptable behavior but documenting it with a test would strengthen the contract.
Missing test: non-string stack field -- As noted in point 4, sending "stack": 123 would raise NoMethodError. A test would expose this.

BLOCKERS

None. The new endpoint has test coverage (happy path + error case). No secrets committed. No unvalidated user input reaching SQL or HTML rendering -- data goes to logs only. No DRY violations in auth paths (there is no auth).

NITS

Robustness: Consider adding .to_s before .truncate(500) on the stack field to handle non-string values gracefully:
```
"stack=#{data['stack'].to_s.truncate(500)}"
```
Test gap: An oversized-payload test and a malformed-types test (e.g., "stack": 123) would harden the contract, but not blocking.
Log injection: If the Loki pipeline parses logs by newline, consider stripping \n from interpolated fields, or switching to structured JSON logging for the [CLIENT_ERROR] line itself (consistent with the lograge JSON formatter already configured).
ErrorSubscriber context.inspect: In production, context could contain large objects. Consider context.slice(:key1, :key2).inspect or a size guard. Low risk for now.

SOP COMPLIANCE

Branch named after issue: 39-add-observability matches issue #39
PR body follows template: Summary, Changes, Test Plan, Related sections all present
Related references plan slug: PR body references project-landscaping-assistant and observability-audit-2026-02-25 but no plan slug was provided for this task (noted as "No plan slug" in the review request -- acceptable)
No secrets committed: Verified -- no API keys, passwords, or .env files in diff
No unnecessary file changes: All 10 changed files are directly related to the observability work described in #39
Commit messages are descriptive: Single commit e807db6 on this branch

PROCESS OBSERVATIONS

Deployment risk: Low. The lograge gem and error subscriber are additive. The client error endpoint is new but isolated. The Stimulus controller changes are bugfix-quality (replacing swallowed errors with logged ones).
Change failure risk: Low. The only production-config change is lograge enablement, which is well-tested in the Rails ecosystem. No database migrations.
Documentation: PR body clearly scopes this as "in-repo portion" of #39, with infra-side work (Blackbox probe, Grafana dashboard, k8s probes) to follow in other repos. Good scope management.
Test suite: 54 examples, 0 failures reported. New tests cover the new endpoint. Existing tests unaffected.

VERDICT: APPROVED

## PR #40 Review ### DOMAIN REVIEW **Tech stack:** Rails 8, Stimulus/importmap, RSpec. This is a Ruby/Rails review with JS (vanilla Stimulus, no TypeScript). **ClientErrorsController (`app/controllers/client_errors_controller.rb`)** 1. **`skip_forgery_protection` is justified** -- the client-side JS sends the CSRF token via `X-CSRF-Token` header, and the controller reads raw body. This is a standard pattern for JSON API endpoints in Rails. No issue. 2. **Body size limit (4KB) is good** -- prevents log-flooding abuse. The `request.body.read(4096)` approach is correct. 3. **No rate limiting** -- This endpoint is unauthenticated (no auth in `ApplicationController`) and CSRF-protected only by a meta tag token. An attacker with a valid CSRF token (which is in every page's HTML) could spam this endpoint to flood logs. This is a low-severity concern for an internal landscaping tool, but worth noting. Not a blocker given the app has no authentication at all. 4. **`data['stack']&.truncate(500)`** -- `truncate` is an ActiveSupport method on String. If `data['stack']` is present but not a String (e.g., attacker sends `"stack": 123`), this will raise `NoMethodError`. The `JSON::ParserError` rescue won't catch it. This would result in a 500 error, which is noisy but not harmful. Minor robustness gap. 5. **No sanitization of logged values** -- `data['message']`, `data['source']`, etc. are interpolated directly into log output. If logs are consumed by a log aggregator (Loki, as mentioned), injected newlines or control characters could corrupt log parsing. Low risk for a structured logging pipeline but worth a note. **Error subscriber (`config/initializers/error_subscriber.rb`)** 6. Clean implementation. The `report` method signature matches the Rails 7+ `ErrorReporter::Subscriber` interface correctly (`error, handled:, severity:, context:, source:`). Uses keyword args properly. **Lograge config (`config/environments/production.rb`)** 7. Lograge config is production-only, which is correct. The `request_id` custom option aligns with the existing `config.log_tags = [ :request_id ]` on line 36. Good correlation setup. **Client-side JS (`app/javascript/application.js`)** 8. The `reportClientError` function has `.catch(() => {})` -- a silent catch on the error-reporting fetch itself. This is actually correct here (you don't want error-reporting failures to themselves trigger error reporting in an infinite loop). Good defensive pattern. 9. `window.onerror` returns `false`, which is correct -- it allows the error to still propagate to the browser console. 10. CSRF token lookup uses optional chaining (`?.content`), and the function returns early if no token is found. This handles SSR/test environments cleanly. **Stimulus controller fixes (`location_controller.js`, `add_location_controller.js`)** 11. All four `.catch()` blocks now log to `console.error` and display actual error messages to the user. This is the right fix -- errors were previously swallowed with generic "Failed" messages. The user now sees `e.message || "fallback"` which is a good pattern. 12. **Minor UX concern**: Raw `e.message` values from fetch/network errors may not be user-friendly (e.g., `"Failed to fetch"`, `"NetworkError when attempting to fetch resource"`). These are browser-generated strings and are acceptable for a dev/internal tool. **Test coverage (`spec/requests/client_errors_spec.rb`)** 13. Two tests: valid payload (204 + log assertion) and invalid JSON (422). Both are correct and meaningful. The log assertion uses `expect(Rails.logger).to receive(:error).with(/CLIENT_ERROR.*TypeError/)` which verifies the log format contract. 14. **Missing test: oversized payload** -- The controller reads only 4096 bytes. There is no test verifying behavior when a payload exceeds that limit. What happens: `body` is truncated, `JSON.parse` likely raises `JSON::ParserError` on incomplete JSON, returning 422. This is acceptable behavior but documenting it with a test would strengthen the contract. 15. **Missing test: non-string stack field** -- As noted in point 4, sending `"stack": 123` would raise `NoMethodError`. A test would expose this. ### BLOCKERS None. The new endpoint has test coverage (happy path + error case). No secrets committed. No unvalidated user input reaching SQL or HTML rendering -- data goes to logs only. No DRY violations in auth paths (there is no auth). ### NITS 1. **Robustness**: Consider adding `.to_s` before `.truncate(500)` on the stack field to handle non-string values gracefully: ```ruby "stack=#{data['stack'].to_s.truncate(500)}" ``` 2. **Test gap**: An oversized-payload test and a malformed-types test (e.g., `"stack": 123`) would harden the contract, but not blocking. 3. **Log injection**: If the Loki pipeline parses logs by newline, consider stripping `\n` from interpolated fields, or switching to structured JSON logging for the `[CLIENT_ERROR]` line itself (consistent with the lograge JSON formatter already configured). 4. **ErrorSubscriber `context.inspect`**: In production, `context` could contain large objects. Consider `context.slice(:key1, :key2).inspect` or a size guard. Low risk for now. ### SOP COMPLIANCE - [x] Branch named after issue: `39-add-observability` matches issue #39 - [x] PR body follows template: Summary, Changes, Test Plan, Related sections all present - [ ] Related references plan slug: PR body references `project-landscaping-assistant` and `observability-audit-2026-02-25` but no plan slug was provided for this task (noted as "No plan slug" in the review request -- acceptable) - [x] No secrets committed: Verified -- no API keys, passwords, or `.env` files in diff - [x] No unnecessary file changes: All 10 changed files are directly related to the observability work described in #39 - [x] Commit messages are descriptive: Single commit `e807db6` on this branch ### PROCESS OBSERVATIONS - **Deployment risk**: Low. The lograge gem and error subscriber are additive. The client error endpoint is new but isolated. The Stimulus controller changes are bugfix-quality (replacing swallowed errors with logged ones). - **Change failure risk**: Low. The only production-config change is lograge enablement, which is well-tested in the Rails ecosystem. No database migrations. - **Documentation**: PR body clearly scopes this as "in-repo portion" of #39, with infra-side work (Blackbox probe, Grafana dashboard, k8s probes) to follow in other repos. Good scope management. - **Test suite**: 54 examples, 0 failures reported. New tests cover the new endpoint. Existing tests unaffected. ### VERDICT: APPROVED

2026-05-29 11:19:38 +00:00

Summary

Client-side JS errors were completely invisible — all catch blocks swallowed errors silently. This adds global error handlers, a server-side reporting endpoint, structured logging, and a Rails error subscriber.
Covers the in-repo portion of #39; infra-side work (Blackbox probe, Grafana dashboard, k8s probes) follows in pal-e-services, pal-e-platform, and pal-e-deployments.

Changes

app/javascript/application.js: Global window.onerror + unhandledrejection handlers that POST to /errors/client
app/controllers/client_errors_controller.rb: New endpoint — logs client errors as [CLIENT_ERROR] lines, 4KB body limit, JSON validation
app/javascript/controllers/location_controller.js: Fix 2 silent .catch() blocks — now logs to console and shows actual error message
app/javascript/controllers/add_location_controller.js: Fix 2 silent .catch() blocks — same treatment
config/initializers/error_subscriber.rb: Rails error subscriber for structured server-side exception logging
config/routes.rb: Add POST /errors/client route
Gemfile + Gemfile.lock: Add lograge gem
config/environments/production.rb: Enable lograge with JSON formatter and request_id
spec/requests/client_errors_spec.rb: Request spec for valid and invalid payloads

Test Plan

Tests pass locally (54 examples, 0 failures)
Rubocop clean (0 offenses)
Manual verification: POST /errors/client logs [CLIENT_ERROR] line to STDOUT
No regressions in existing request specs

Review Checklist

Passed automated review-fix loop
No secrets committed
No unnecessary file changes
Commit messages are descriptive

project-landscaping-assistant — project this work belongs to
observability-audit-2026-02-25 — platform observability baseline that informed the approach

Closes #39

## Summary - Client-side JS errors were completely invisible — all catch blocks swallowed errors silently. This adds global error handlers, a server-side reporting endpoint, structured logging, and a Rails error subscriber. - Covers the in-repo portion of #39; infra-side work (Blackbox probe, Grafana dashboard, k8s probes) follows in pal-e-services, pal-e-platform, and pal-e-deployments. ## Changes - `app/javascript/application.js`: Global `window.onerror` + `unhandledrejection` handlers that POST to `/errors/client` - `app/controllers/client_errors_controller.rb`: New endpoint — logs client errors as `[CLIENT_ERROR]` lines, 4KB body limit, JSON validation - `app/javascript/controllers/location_controller.js`: Fix 2 silent `.catch()` blocks — now logs to console and shows actual error message - `app/javascript/controllers/add_location_controller.js`: Fix 2 silent `.catch()` blocks — same treatment - `config/initializers/error_subscriber.rb`: Rails error subscriber for structured server-side exception logging - `config/routes.rb`: Add `POST /errors/client` route - `Gemfile` + `Gemfile.lock`: Add `lograge` gem - `config/environments/production.rb`: Enable lograge with JSON formatter and request_id - `spec/requests/client_errors_spec.rb`: Request spec for valid and invalid payloads ## Test Plan - [x] Tests pass locally (54 examples, 0 failures) - [x] Rubocop clean (0 offenses) - [x] Manual verification: `POST /errors/client` logs `[CLIENT_ERROR]` line to STDOUT - [x] No regressions in existing request specs ## Review Checklist - [ ] Passed automated review-fix loop - [ ] No secrets committed - [ ] No unnecessary file changes - [ ] Commit messages are descriptive ## Related Notes - `project-landscaping-assistant` — project this work belongs to - `observability-audit-2026-02-25` — platform observability baseline that informed the approach Closes #39

ldraney added 1 commit

Add client-side error reporting, structured logging, and error subscriber (#39 )

ci/woodpecker/push/woodpecker Pipeline was successful

Details

ci/woodpecker/pr/woodpecker Pipeline was successful

CI / scan_ruby (pull_request) Has been cancelled

CI / scan_js (pull_request) Has been cancelled

CI / lint (pull_request) Has been cancelled

1522aa6c31

Client-side JS errors were completely invisible — all catch blocks
swallowed errors silently. This adds a global error handler that POSTs
to a new /errors/client endpoint, fixes all silent catch blocks to log
actual errors, adds a Rails error subscriber for structured server-side
exception logging, and enables lograge for single-line JSON request logs
in production.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

ldraney merged commit d83da4a371 into main

2026-05-29 11:53:14 +00:00

ldraney deleted branch 39-add-observability

ldraney referenced this pull request from a commit

Add observability: client-side errors, structured logs, error subscriber (#39) (#40)

Rows
Columns

Add observability: client-side errors, structured logs, error subscriber (#39) #40