feat: seed full US McDonald's + cache-on-miss for uncovered areas #23
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "22-seed-full-us-mcdonald-s-cache-on-miss-fo"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Summary
Change seed script default from Denver 50km to all US McDonald's (~13k locations) using an area-based Overpass query. Add cache-on-miss to
/locations/nearby: when Postgres has no seeded data for a queried area, the endpoint falls back to Overpass, caches the results, and serves them. Subsequent queries are instant from Postgres. Overpass failures during cache-on-miss return an empty list gracefully (not 503).Changes
src/mcd_tracker_api/scripts/seed_locations.py-- Replaced Denver 50km default with full-US area query usingarea["ISO3166-1"="US"]. Added_build_area_query(),_build_radius_query(),fetch_locations_by_area(), and_parse_overpass_response(). Script now defaults to area mode; radius mode available via--lat/--lng/--radiusflags. Increased timeout to 180s for full-US query. Added--countryflag for other countries.src/mcd_tracker_api/routes/nearby.py-- Added cache-on-miss logic: extracted_query_cached_locations()(returns coverage flag + results),_cache_overpass_results()(upserts Overpass data intomcdonalds_locations), and_cache_on_miss()(async Overpass fallback with graceful error handling). Made endpointasyncto support the async Overpass client. Cache-on-miss only fires when the bounding box query finds zero rows (no coverage), not when rows exist but none pass the haversine filter.tests/test_seed_locations.py-- AddedTestBuildQueries(area query ISO3166-1, radius query around filter) andTestFetchLocationsByArea(parses area response, empty response). Existing radius-mode tests preserved.tests/test_nearby.py-- AddedTestNearbyCacheOnMisswith 7 integration tests: Overpass fallback when DB empty, Overpass failure returns empty, subsequent query serves from cache (no Overpass call), cache-on-miss does not fire when DB has data, upsert on existing osm_id, multiple locations, empty Overpass response. Updated 2 existing tests to mock Overpass where cache-on-miss would fire.Test Plan
pytest tests/ -v-- 160 passed, 1 pre-existing flaky test (test_code_at_exact_30_day_boundary_not_active-- timing race, unrelated)python -m mcd_tracker_api.scripts.seed_locationsseeds all US locations (run against real Overpass)/locations/nearbyreturns cached results instantly, falls back to Overpass for uncovered areasReview Checklist
/locations/nearbyreturns cached results when available (no Overpass call)/locations/nearbyfalls back to Overpass when no cached locations exist for the areamcdonalds_locationsso subsequent queries are instantNearbyResponse)Related
Self-Review
Reviewed the full diff (4 files, +456/-43). No blocking issues found.
Verified
ISO3166-1). Radius mode preserved via--lat/--lng/--radiusflags. Timeout increased to 180s. Parsing logic extracted to shared_parse_overpass_response()-- no duplication.OverpassErroris caught and returns empty list (200, not 503).OverpassTimeoutErroris a subclass ofOverpassError, so timeouts are also handled._cache_overpass_resultsqueries byosm_idand updates existing rows, matching the seed script's upsert pattern.deftoasync defto supportawait query_nearby(). FastAPI handles this correctly._cache_overpass_resultscommits on the request's DB session, same pattern used elsewhere.db.refresh()after commit ensures generated IDs are available.Nit (non-blocking)
_cache_overpass_results(db: Session, osm_locations: list)-- thelisttype hint could belist[OsmLocation]for clarity. Not worth a fix cycle.Test results
PR #23 Review
DOMAIN REVIEW
Tech stack: Python / FastAPI / SQLAlchemy / httpx / pytest. Domain checklists applied: PEP compliance, OWASP, SQLAlchemy patterns, FastAPI specifics.
Async/sync mixing in the endpoint (
/home/ldraney/mcd-tracker-api/src/mcd_tracker_api/routes/nearby.py, line 209):The endpoint was changed from
deftoasync defso it canawait _cache_on_miss()->await query_nearby(). This is correct for the async Overpass call. However, all SQLAlchemy operations in this endpoint (_query_cached_locations(),_cache_overpass_results(), and the saved-location queries) use synchronousdb.query()calls. In anasync defFastAPI endpoint, synchronous I/O blocks the main asyncio event loop, whereasdefendpoints run in a threadpool automatically.For a personal tracker app under light load, this has no practical impact. Under concurrent requests, the synchronous DB calls would serialize on the event loop. The proper fix would be
run_in_executorfor sync DB calls or migrating to async SQLAlchemy. Noting as a nit, not a blocker.Type hint gap (
/home/ldraney/mcd-tracker-api/src/mcd_tracker_api/routes/nearby.py, line 128):The
osm_locationsparameter uses barelistinstead oflist[OsmLocation]. PEP 484 type hints should be specific. This makes the function's contract unclear and defeats type-checker validation.DRY observation:
_cache_overpass_results()innearby.pyandupsert_locations()inseed_locations.pyboth implement osm_id-based upsert logic. They operate on different input shapes (OsmLocation dataclass vs dict), so the duplication is tolerable. If the upsert logic changes (e.g., adding a new field to McDonaldsLocation), both need updating. Worth noting but not blocking.Seed script
_parse_overpass_responseextraction (/home/ldraney/mcd-tracker-api/src/mcd_tracker_api/scripts/seed_locations.py, line 88): Good refactor -- the parsing logic was correctly extracted fromfetch_locations()and shared withfetch_locations_by_area(). DRY within the script is clean.Overpass query construction (
/home/ldraney/mcd-tracker-api/src/mcd_tracker_api/scripts/seed_locations.py, line 41): Thecountry_codeparameter is interpolated directly into the Overpass QL query string via f-string. This is only called frommain()via CLI argparse, so the input is operator-controlled, not user-facing. Not an injection risk in this context. If this were ever exposed to web input, it would need validation.CLI partial-args edge case (
/home/ldraney/mcd-tracker-api/src/mcd_tracker_api/scripts/seed_locations.py, line 183): If a user provides--latand--lngbut forgets--radius, the script silently falls through to area mode instead of radius mode. This could be surprising. A validation warning would be friendlier.Graceful degradation: The
_cache_on_miss()function properly catchesOverpassErrorand returns an empty list. Thehas_coverageflag logic is well-designed -- it distinguishes between "no data for this area" (trigger cache-on-miss) and "data exists but none within exact radius" (area is covered, just sparse). This is correct.BLOCKERS
None.
Query()withge/leconstraints.NITS
Bare
listtype hint --_cache_overpass_results(db: Session, osm_locations: list)should belist[OsmLocation]for PEP 484 compliance. (nearby.pyline 128)Async endpoint with sync DB -- The
async def get_nearby_locations()endpoint blocks the event loop during synchronous SQLAlchemy queries. Consider wrapping sync DB calls inasyncio.to_thread()or documenting why this is acceptable for the current scale. Not a blocker for a personal app.CLI partial-args silent fallthrough -- Providing
--lat 39.7and--lng -104.9without--radiussilently falls through to area mode. Consider logging a warning or requiring all three when any are provided. (seed_locations.pyline 183)Test query verification fragility -- In
test_fetches_all_us_locations, lines 84-89 oftest_seed_locations.pyuse a fragile chain of.get()/call_argsunpacking to verify the query string. A simpler assertion onmock_post.call_args.kwargs["data"]["data"]would be more readable, or just assertmock_post.calledsince the query string is already tested inTestBuildQueries.SOP COMPLIANCE
22-seed-full-us-mcdonald-s-cache-on-miss-foreferences issue #22)PROCESS OBSERVATIONS
mcdonalds_locationstable was added in PR #21. Ready to deploy.VERDICT: APPROVED