Seed full US McDonald's + cache-on-miss for uncovered areas #22

Closed
opened 2026-03-18 17:25:27 +00:00 by forgejo_admin · 0 comments
Contributor

Lineage

todo-mcd-smart-proximity (lightweight work, no plan phase)

Repo

forgejo_admin/mcd-tracker-api

User Story

As a user anywhere in the US tapping "Near Me"
I want to see nearby McDonald's instantly from a pre-seeded database
So that I never depend on a flaky external API at runtime

Context

PR #21 merged the mcdonalds_locations table, seed script, and refactored /locations/nearby to query Postgres instead of Overpass. However, the seed script defaults to Denver metro (50km radius). The user is NOT in Denver.

Two changes needed:

  1. Full US seed: Change the seed script default to query ALL US McDonald's (~13k rows). One Overpass query: node["brand"="McDonald's"](area:US);. This is a trivial dataset for Postgres.
  2. Cache-on-miss: When /locations/nearby finds no cached locations within the query radius, fall back to querying Overpass, cache the results in mcdonalds_locations, and serve them. This handles edge cases (international users, brand-new locations not yet in the monthly refresh).

The cache-on-miss makes the Overpass call a first-time-only fallback, not the primary path. After the first query in any area, all subsequent queries are instant from Postgres.

Design decisions:

  • Full US pre-seed covers 99% of queries — cache-on-miss is the safety net
  • Cache TTL: check created_at on cached rows. If oldest row in the area is >30 days, refresh from Overpass in background
  • The seed script is also the monthly refresh mechanism — run via cron with the same full-US query
  • No Redis needed — 13k rows with lat/lng index is microsecond-level for Postgres

File Targets

Files to modify:

  • src/mcd_tracker_api/scripts/seed_locations.py — change default from Denver 50km to full US. Add --country flag or use area-based Overpass query. Increase timeout for larger query.
  • src/mcd_tracker_api/routes/nearby.py — add cache-on-miss logic: if Postgres returns 0 results for the query area, call Overpass, upsert results, return them. Make this async since it may call Overpass.
  • src/mcd_tracker_api/models.py — consider adding region_queried_at or using created_at for cache freshness

Files NOT to touch:

  • src/mcd_tracker_api/schemas.py — response schema stays the same
  • src/mcd_tracker_api/services/overpass.py — already works, used by both seed script and cache-on-miss
  • src/mcd_tracker_api/routes/locations.py — saved locations are separate

Acceptance Criteria

  • Seed script defaults to full US (~13k McDonald's) instead of Denver 50km
  • Running python -m mcd_tracker_api.scripts.seed_locations seeds all US locations
  • Seed script remains idempotent (upsert on osm_id)
  • /locations/nearby returns cached results when available (no Overpass call)
  • /locations/nearby falls back to Overpass when no cached locations exist for the area, caches the results
  • Cache-on-miss upserts into mcdonalds_locations so subsequent queries are instant
  • If Overpass fails during cache-on-miss, return empty list gracefully (not 503)
  • Response schema unchanged (NearbyResponse)
  • All existing tests pass

Test Expectations

  • Integration test: nearby returns results from pre-seeded data (existing tests still pass)
  • Integration test: cache-on-miss calls Overpass when DB is empty for area, caches results
  • Integration test: cache-on-miss Overpass failure returns empty list (graceful degradation)
  • Integration test: subsequent query after cache-on-miss serves from DB (no Overpass call)
  • Unit test: seed script full-US query builds correct Overpass QL
  • Run command: pytest tests/ -v

Constraints

  • Keep the seed script simple — one Overpass query for the full US, not per-state
  • Overpass timeout for full-US query may need 120s+ (the script already has SEED_TIMEOUT = 120.0)
  • Cache-on-miss should NOT block the response if Overpass is slow — consider returning empty immediately and seeding in background, or accept the one-time latency with a reasonable timeout
  • Do not add PostGIS — haversine in Python is fine for this scale
  • Match existing code patterns in routes/nearby.py and scripts/seed_locations.py

Checklist

  • PR opened
  • Tests pass
  • No unrelated changes
  • project-mcd-tracker — project this affects
  • todo-mcd-smart-proximity — pal-e-docs TODO
  • PR #21 — foundational work (table, model, initial seed script, refactored endpoint)
### Lineage `todo-mcd-smart-proximity` (lightweight work, no plan phase) ### Repo `forgejo_admin/mcd-tracker-api` ### User Story As a user anywhere in the US tapping "Near Me" I want to see nearby McDonald's instantly from a pre-seeded database So that I never depend on a flaky external API at runtime ### Context PR #21 merged the `mcdonalds_locations` table, seed script, and refactored `/locations/nearby` to query Postgres instead of Overpass. However, the seed script defaults to Denver metro (50km radius). The user is NOT in Denver. Two changes needed: 1. **Full US seed**: Change the seed script default to query ALL US McDonald's (~13k rows). One Overpass query: `node["brand"="McDonald's"](area:US);`. This is a trivial dataset for Postgres. 2. **Cache-on-miss**: When `/locations/nearby` finds no cached locations within the query radius, fall back to querying Overpass, cache the results in `mcdonalds_locations`, and serve them. This handles edge cases (international users, brand-new locations not yet in the monthly refresh). The cache-on-miss makes the Overpass call a first-time-only fallback, not the primary path. After the first query in any area, all subsequent queries are instant from Postgres. Design decisions: - Full US pre-seed covers 99% of queries — cache-on-miss is the safety net - Cache TTL: check `created_at` on cached rows. If oldest row in the area is >30 days, refresh from Overpass in background - The seed script is also the monthly refresh mechanism — run via cron with the same full-US query - No Redis needed — 13k rows with lat/lng index is microsecond-level for Postgres ### File Targets Files to modify: - `src/mcd_tracker_api/scripts/seed_locations.py` — change default from Denver 50km to full US. Add `--country` flag or use area-based Overpass query. Increase timeout for larger query. - `src/mcd_tracker_api/routes/nearby.py` — add cache-on-miss logic: if Postgres returns 0 results for the query area, call Overpass, upsert results, return them. Make this async since it may call Overpass. - `src/mcd_tracker_api/models.py` — consider adding `region_queried_at` or using `created_at` for cache freshness Files NOT to touch: - `src/mcd_tracker_api/schemas.py` — response schema stays the same - `src/mcd_tracker_api/services/overpass.py` — already works, used by both seed script and cache-on-miss - `src/mcd_tracker_api/routes/locations.py` — saved locations are separate ### Acceptance Criteria - [ ] Seed script defaults to full US (~13k McDonald's) instead of Denver 50km - [ ] Running `python -m mcd_tracker_api.scripts.seed_locations` seeds all US locations - [ ] Seed script remains idempotent (upsert on osm_id) - [ ] `/locations/nearby` returns cached results when available (no Overpass call) - [ ] `/locations/nearby` falls back to Overpass when no cached locations exist for the area, caches the results - [ ] Cache-on-miss upserts into `mcdonalds_locations` so subsequent queries are instant - [ ] If Overpass fails during cache-on-miss, return empty list gracefully (not 503) - [ ] Response schema unchanged (`NearbyResponse`) - [ ] All existing tests pass ### Test Expectations - [ ] Integration test: nearby returns results from pre-seeded data (existing tests still pass) - [ ] Integration test: cache-on-miss calls Overpass when DB is empty for area, caches results - [ ] Integration test: cache-on-miss Overpass failure returns empty list (graceful degradation) - [ ] Integration test: subsequent query after cache-on-miss serves from DB (no Overpass call) - [ ] Unit test: seed script full-US query builds correct Overpass QL - Run command: `pytest tests/ -v` ### Constraints - Keep the seed script simple — one Overpass query for the full US, not per-state - Overpass timeout for full-US query may need 120s+ (the script already has `SEED_TIMEOUT = 120.0`) - Cache-on-miss should NOT block the response if Overpass is slow — consider returning empty immediately and seeding in background, or accept the one-time latency with a reasonable timeout - Do not add PostGIS — haversine in Python is fine for this scale - Match existing code patterns in `routes/nearby.py` and `scripts/seed_locations.py` ### Checklist - [ ] PR opened - [ ] Tests pass - [ ] No unrelated changes ### Related - `project-mcd-tracker` — project this affects - `todo-mcd-smart-proximity` — pal-e-docs TODO - PR #21 — foundational work (table, model, initial seed script, refactored endpoint)
Commenting is not possible because the repository is archived.
No labels
No milestone
No project
No assignees
1 participant
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
ldraney/mcd-tracker-api#22
No description provided.