Add observability: client-side errors, structured logs, error subscriber (#39) #40

Merged
ldraney merged 1 commit from 39-add-observability into main 2026-05-29 11:53:14 +00:00
Owner

Summary

  • Client-side JS errors were completely invisible — all catch blocks swallowed errors silently. This adds global error handlers, a server-side reporting endpoint, structured logging, and a Rails error subscriber.
  • Covers the in-repo portion of #39; infra-side work (Blackbox probe, Grafana dashboard, k8s probes) follows in pal-e-services, pal-e-platform, and pal-e-deployments.

Changes

  • app/javascript/application.js: Global window.onerror + unhandledrejection handlers that POST to /errors/client
  • app/controllers/client_errors_controller.rb: New endpoint — logs client errors as [CLIENT_ERROR] lines, 4KB body limit, JSON validation
  • app/javascript/controllers/location_controller.js: Fix 2 silent .catch() blocks — now logs to console and shows actual error message
  • app/javascript/controllers/add_location_controller.js: Fix 2 silent .catch() blocks — same treatment
  • config/initializers/error_subscriber.rb: Rails error subscriber for structured server-side exception logging
  • config/routes.rb: Add POST /errors/client route
  • Gemfile + Gemfile.lock: Add lograge gem
  • config/environments/production.rb: Enable lograge with JSON formatter and request_id
  • spec/requests/client_errors_spec.rb: Request spec for valid and invalid payloads

Test Plan

  • Tests pass locally (54 examples, 0 failures)
  • Rubocop clean (0 offenses)
  • Manual verification: POST /errors/client logs [CLIENT_ERROR] line to STDOUT
  • No regressions in existing request specs

Review Checklist

  • Passed automated review-fix loop
  • No secrets committed
  • No unnecessary file changes
  • Commit messages are descriptive
  • project-landscaping-assistant — project this work belongs to
  • observability-audit-2026-02-25 — platform observability baseline that informed the approach

Closes #39

## Summary - Client-side JS errors were completely invisible — all catch blocks swallowed errors silently. This adds global error handlers, a server-side reporting endpoint, structured logging, and a Rails error subscriber. - Covers the in-repo portion of #39; infra-side work (Blackbox probe, Grafana dashboard, k8s probes) follows in pal-e-services, pal-e-platform, and pal-e-deployments. ## Changes - `app/javascript/application.js`: Global `window.onerror` + `unhandledrejection` handlers that POST to `/errors/client` - `app/controllers/client_errors_controller.rb`: New endpoint — logs client errors as `[CLIENT_ERROR]` lines, 4KB body limit, JSON validation - `app/javascript/controllers/location_controller.js`: Fix 2 silent `.catch()` blocks — now logs to console and shows actual error message - `app/javascript/controllers/add_location_controller.js`: Fix 2 silent `.catch()` blocks — same treatment - `config/initializers/error_subscriber.rb`: Rails error subscriber for structured server-side exception logging - `config/routes.rb`: Add `POST /errors/client` route - `Gemfile` + `Gemfile.lock`: Add `lograge` gem - `config/environments/production.rb`: Enable lograge with JSON formatter and request_id - `spec/requests/client_errors_spec.rb`: Request spec for valid and invalid payloads ## Test Plan - [x] Tests pass locally (54 examples, 0 failures) - [x] Rubocop clean (0 offenses) - [x] Manual verification: `POST /errors/client` logs `[CLIENT_ERROR]` line to STDOUT - [x] No regressions in existing request specs ## Review Checklist - [ ] Passed automated review-fix loop - [ ] No secrets committed - [ ] No unnecessary file changes - [ ] Commit messages are descriptive ## Related Notes - `project-landscaping-assistant` — project this work belongs to - `observability-audit-2026-02-25` — platform observability baseline that informed the approach Closes #39
Add client-side error reporting, structured logging, and error subscriber (#39)
Some checks failed
ci/woodpecker/push/woodpecker Pipeline was successful
ci/woodpecker/pr/woodpecker Pipeline was successful
CI / scan_ruby (pull_request) Has been cancelled
CI / scan_js (pull_request) Has been cancelled
CI / lint (pull_request) Has been cancelled
1522aa6c31
Client-side JS errors were completely invisible — all catch blocks
swallowed errors silently. This adds a global error handler that POSTs
to a new /errors/client endpoint, fixes all silent catch blocks to log
actual errors, adds a Rails error subscriber for structured server-side
exception logging, and enables lograge for single-line JSON request logs
in production.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Author
Owner

PR #40 Review

DOMAIN REVIEW

Tech stack: Rails 8, Stimulus/importmap, RSpec. This is a Ruby/Rails review with JS (vanilla Stimulus, no TypeScript).

ClientErrorsController (app/controllers/client_errors_controller.rb)

  1. skip_forgery_protection is justified -- the client-side JS sends the CSRF token via X-CSRF-Token header, and the controller reads raw body. This is a standard pattern for JSON API endpoints in Rails. No issue.

  2. Body size limit (4KB) is good -- prevents log-flooding abuse. The request.body.read(4096) approach is correct.

  3. No rate limiting -- This endpoint is unauthenticated (no auth in ApplicationController) and CSRF-protected only by a meta tag token. An attacker with a valid CSRF token (which is in every page's HTML) could spam this endpoint to flood logs. This is a low-severity concern for an internal landscaping tool, but worth noting. Not a blocker given the app has no authentication at all.

  4. data['stack']&.truncate(500) -- truncate is an ActiveSupport method on String. If data['stack'] is present but not a String (e.g., attacker sends "stack": 123), this will raise NoMethodError. The JSON::ParserError rescue won't catch it. This would result in a 500 error, which is noisy but not harmful. Minor robustness gap.

  5. No sanitization of logged values -- data['message'], data['source'], etc. are interpolated directly into log output. If logs are consumed by a log aggregator (Loki, as mentioned), injected newlines or control characters could corrupt log parsing. Low risk for a structured logging pipeline but worth a note.

Error subscriber (config/initializers/error_subscriber.rb)

  1. Clean implementation. The report method signature matches the Rails 7+ ErrorReporter::Subscriber interface correctly (error, handled:, severity:, context:, source:). Uses keyword args properly.

Lograge config (config/environments/production.rb)

  1. Lograge config is production-only, which is correct. The request_id custom option aligns with the existing config.log_tags = [ :request_id ] on line 36. Good correlation setup.

Client-side JS (app/javascript/application.js)

  1. The reportClientError function has .catch(() => {}) -- a silent catch on the error-reporting fetch itself. This is actually correct here (you don't want error-reporting failures to themselves trigger error reporting in an infinite loop). Good defensive pattern.

  2. window.onerror returns false, which is correct -- it allows the error to still propagate to the browser console.

  3. CSRF token lookup uses optional chaining (?.content), and the function returns early if no token is found. This handles SSR/test environments cleanly.

Stimulus controller fixes (location_controller.js, add_location_controller.js)

  1. All four .catch() blocks now log to console.error and display actual error messages to the user. This is the right fix -- errors were previously swallowed with generic "Failed" messages. The user now sees e.message || "fallback" which is a good pattern.

  2. Minor UX concern: Raw e.message values from fetch/network errors may not be user-friendly (e.g., "Failed to fetch", "NetworkError when attempting to fetch resource"). These are browser-generated strings and are acceptable for a dev/internal tool.

Test coverage (spec/requests/client_errors_spec.rb)

  1. Two tests: valid payload (204 + log assertion) and invalid JSON (422). Both are correct and meaningful. The log assertion uses expect(Rails.logger).to receive(:error).with(/CLIENT_ERROR.*TypeError/) which verifies the log format contract.

  2. Missing test: oversized payload -- The controller reads only 4096 bytes. There is no test verifying behavior when a payload exceeds that limit. What happens: body is truncated, JSON.parse likely raises JSON::ParserError on incomplete JSON, returning 422. This is acceptable behavior but documenting it with a test would strengthen the contract.

  3. Missing test: non-string stack field -- As noted in point 4, sending "stack": 123 would raise NoMethodError. A test would expose this.

BLOCKERS

None. The new endpoint has test coverage (happy path + error case). No secrets committed. No unvalidated user input reaching SQL or HTML rendering -- data goes to logs only. No DRY violations in auth paths (there is no auth).

NITS

  1. Robustness: Consider adding .to_s before .truncate(500) on the stack field to handle non-string values gracefully:

    "stack=#{data['stack'].to_s.truncate(500)}"
    
  2. Test gap: An oversized-payload test and a malformed-types test (e.g., "stack": 123) would harden the contract, but not blocking.

  3. Log injection: If the Loki pipeline parses logs by newline, consider stripping \n from interpolated fields, or switching to structured JSON logging for the [CLIENT_ERROR] line itself (consistent with the lograge JSON formatter already configured).

  4. ErrorSubscriber context.inspect: In production, context could contain large objects. Consider context.slice(:key1, :key2).inspect or a size guard. Low risk for now.

SOP COMPLIANCE

  • Branch named after issue: 39-add-observability matches issue #39
  • PR body follows template: Summary, Changes, Test Plan, Related sections all present
  • Related references plan slug: PR body references project-landscaping-assistant and observability-audit-2026-02-25 but no plan slug was provided for this task (noted as "No plan slug" in the review request -- acceptable)
  • No secrets committed: Verified -- no API keys, passwords, or .env files in diff
  • No unnecessary file changes: All 10 changed files are directly related to the observability work described in #39
  • Commit messages are descriptive: Single commit e807db6 on this branch

PROCESS OBSERVATIONS

  • Deployment risk: Low. The lograge gem and error subscriber are additive. The client error endpoint is new but isolated. The Stimulus controller changes are bugfix-quality (replacing swallowed errors with logged ones).
  • Change failure risk: Low. The only production-config change is lograge enablement, which is well-tested in the Rails ecosystem. No database migrations.
  • Documentation: PR body clearly scopes this as "in-repo portion" of #39, with infra-side work (Blackbox probe, Grafana dashboard, k8s probes) to follow in other repos. Good scope management.
  • Test suite: 54 examples, 0 failures reported. New tests cover the new endpoint. Existing tests unaffected.

VERDICT: APPROVED

## PR #40 Review ### DOMAIN REVIEW **Tech stack:** Rails 8, Stimulus/importmap, RSpec. This is a Ruby/Rails review with JS (vanilla Stimulus, no TypeScript). **ClientErrorsController (`app/controllers/client_errors_controller.rb`)** 1. **`skip_forgery_protection` is justified** -- the client-side JS sends the CSRF token via `X-CSRF-Token` header, and the controller reads raw body. This is a standard pattern for JSON API endpoints in Rails. No issue. 2. **Body size limit (4KB) is good** -- prevents log-flooding abuse. The `request.body.read(4096)` approach is correct. 3. **No rate limiting** -- This endpoint is unauthenticated (no auth in `ApplicationController`) and CSRF-protected only by a meta tag token. An attacker with a valid CSRF token (which is in every page's HTML) could spam this endpoint to flood logs. This is a low-severity concern for an internal landscaping tool, but worth noting. Not a blocker given the app has no authentication at all. 4. **`data['stack']&.truncate(500)`** -- `truncate` is an ActiveSupport method on String. If `data['stack']` is present but not a String (e.g., attacker sends `"stack": 123`), this will raise `NoMethodError`. The `JSON::ParserError` rescue won't catch it. This would result in a 500 error, which is noisy but not harmful. Minor robustness gap. 5. **No sanitization of logged values** -- `data['message']`, `data['source']`, etc. are interpolated directly into log output. If logs are consumed by a log aggregator (Loki, as mentioned), injected newlines or control characters could corrupt log parsing. Low risk for a structured logging pipeline but worth a note. **Error subscriber (`config/initializers/error_subscriber.rb`)** 6. Clean implementation. The `report` method signature matches the Rails 7+ `ErrorReporter::Subscriber` interface correctly (`error, handled:, severity:, context:, source:`). Uses keyword args properly. **Lograge config (`config/environments/production.rb`)** 7. Lograge config is production-only, which is correct. The `request_id` custom option aligns with the existing `config.log_tags = [ :request_id ]` on line 36. Good correlation setup. **Client-side JS (`app/javascript/application.js`)** 8. The `reportClientError` function has `.catch(() => {})` -- a silent catch on the error-reporting fetch itself. This is actually correct here (you don't want error-reporting failures to themselves trigger error reporting in an infinite loop). Good defensive pattern. 9. `window.onerror` returns `false`, which is correct -- it allows the error to still propagate to the browser console. 10. CSRF token lookup uses optional chaining (`?.content`), and the function returns early if no token is found. This handles SSR/test environments cleanly. **Stimulus controller fixes (`location_controller.js`, `add_location_controller.js`)** 11. All four `.catch()` blocks now log to `console.error` and display actual error messages to the user. This is the right fix -- errors were previously swallowed with generic "Failed" messages. The user now sees `e.message || "fallback"` which is a good pattern. 12. **Minor UX concern**: Raw `e.message` values from fetch/network errors may not be user-friendly (e.g., `"Failed to fetch"`, `"NetworkError when attempting to fetch resource"`). These are browser-generated strings and are acceptable for a dev/internal tool. **Test coverage (`spec/requests/client_errors_spec.rb`)** 13. Two tests: valid payload (204 + log assertion) and invalid JSON (422). Both are correct and meaningful. The log assertion uses `expect(Rails.logger).to receive(:error).with(/CLIENT_ERROR.*TypeError/)` which verifies the log format contract. 14. **Missing test: oversized payload** -- The controller reads only 4096 bytes. There is no test verifying behavior when a payload exceeds that limit. What happens: `body` is truncated, `JSON.parse` likely raises `JSON::ParserError` on incomplete JSON, returning 422. This is acceptable behavior but documenting it with a test would strengthen the contract. 15. **Missing test: non-string stack field** -- As noted in point 4, sending `"stack": 123` would raise `NoMethodError`. A test would expose this. ### BLOCKERS None. The new endpoint has test coverage (happy path + error case). No secrets committed. No unvalidated user input reaching SQL or HTML rendering -- data goes to logs only. No DRY violations in auth paths (there is no auth). ### NITS 1. **Robustness**: Consider adding `.to_s` before `.truncate(500)` on the stack field to handle non-string values gracefully: ```ruby "stack=#{data['stack'].to_s.truncate(500)}" ``` 2. **Test gap**: An oversized-payload test and a malformed-types test (e.g., `"stack": 123`) would harden the contract, but not blocking. 3. **Log injection**: If the Loki pipeline parses logs by newline, consider stripping `\n` from interpolated fields, or switching to structured JSON logging for the `[CLIENT_ERROR]` line itself (consistent with the lograge JSON formatter already configured). 4. **ErrorSubscriber `context.inspect`**: In production, `context` could contain large objects. Consider `context.slice(:key1, :key2).inspect` or a size guard. Low risk for now. ### SOP COMPLIANCE - [x] Branch named after issue: `39-add-observability` matches issue #39 - [x] PR body follows template: Summary, Changes, Test Plan, Related sections all present - [ ] Related references plan slug: PR body references `project-landscaping-assistant` and `observability-audit-2026-02-25` but no plan slug was provided for this task (noted as "No plan slug" in the review request -- acceptable) - [x] No secrets committed: Verified -- no API keys, passwords, or `.env` files in diff - [x] No unnecessary file changes: All 10 changed files are directly related to the observability work described in #39 - [x] Commit messages are descriptive: Single commit `e807db6` on this branch ### PROCESS OBSERVATIONS - **Deployment risk**: Low. The lograge gem and error subscriber are additive. The client error endpoint is new but isolated. The Stimulus controller changes are bugfix-quality (replacing swallowed errors with logged ones). - **Change failure risk**: Low. The only production-config change is lograge enablement, which is well-tested in the Rails ecosystem. No database migrations. - **Documentation**: PR body clearly scopes this as "in-repo portion" of #39, with infra-side work (Blackbox probe, Grafana dashboard, k8s probes) to follow in other repos. Good scope management. - **Test suite**: 54 examples, 0 failures reported. New tests cover the new endpoint. Existing tests unaffected. ### VERDICT: APPROVED
ldraney deleted branch 39-add-observability 2026-05-29 11:53:14 +00:00
Sign in to join this conversation.
No reviewers
No labels
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
ldraney/landscaping-assistant!40
No description provided.