feat: AI engine core with Anthropic tool_use and GroupMe posting #12
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "6-ai-engine-core"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Summary
Implements the Anthropic Claude tool_use integration for processing GroupMe messages as basketball program operations. Read tools execute immediately via basketball.py; write tools return confirmation indicators (not executed) for ticket #8 to handle the confirmation UX.
Changes
app/ai.py(new) — AI engine core: system prompt, 14 tool definitions (7 reads + 7 writes with metadata tags), in-memory conversation history (20 messages per group),process_message()entry point, read tool dispatch via BasketballClient, write tool confirmation signals, GroupMe bot response posting with 1000-char truncationapp/groupme.py(modified) — Wired callback toai.process_message(), added empty-text guard, wrapped AI call in try/except for resiliencetests/test_ai.py(new) — 38 tests covering tool definitions, conversation history cap, read tool dispatch, write tool non-execution, truncation, GroupMe posting, full process_message integration, and text extractionTest Plan
pytest tests/ -v— 68 tests pass (38 new + 30 existing)pending_confirmationdict, basketball client NOT calledReview Checklist
Related
Closes #6
Related Notes
project-westside-ai-assistant— parent projectarch-dataflow-westside-ai-assistant— full message flowstory-westside-ai-assistant-read-ops— read operations storyQA Review — PR #12
Acceptance Criteria Check
Code Quality
app/ai.py
operation: read/write) enables clean dispatch logicpending_confirmationdicthttpx.AsyncClienttohttps://api.groupme.com/v3/bots/posttext[:997] + "..."when over 1000 charsapp/groupme.py
process_message()-> GroupMe posttests/test_ai.py
mock_basketball.update_player.assert_not_called()Nits
Dual history stores:
_conversation_history(simple strings) and_raw_conversation_history(structured tool blocks) serve different purposes but could drift. Acceptable for now since #8 will rework conversation state for confirmation flows.Synchronous Anthropic client: Uses
anthropic.Anthropic(sync) inside async functions. Works because the sync client blocks the event loop briefly. Consider migrating toanthropic.AsyncAnthropicin a future ticket for better concurrency under load.No
asyncio_modein pytest config: Tests use@pytest.mark.asyncioper-test. Apyproject.tomlwithasyncio_mode = "auto"would reduce boilerplate, but this matches the existing test style intest_basketball.py.None of these are blockers.
Test Results
68/68 tests pass. No regressions.
VERDICT: APPROVED
PR #12 Review
DOMAIN REVIEW
Tech stack: Python / FastAPI / Anthropic SDK / httpx / Pydantic / pytest
This PR implements the AI engine core for the westside-ai-assistant: Anthropic Claude tool_use integration with 14 tool definitions (7 read, 7 write), in-memory conversation history, read-tool dispatch to BasketballClient, write-tool confirmation signals (deferred to #8), and GroupMe bot response posting.
Architecture quality: Clean separation of concerns. The tool_use loop pattern is well-structured: read tools execute immediately via BasketballClient, write tools return
pending_confirmationdicts. The dual history system (simple deque for capping + raw structured history for API calls) solves the Anthropic API's requirement for structured tool_use/tool_result blocks while maintaining a clean capped history.Python/FastAPI observations:
Synchronous Anthropic client in async context (medium concern):
process_message()isasyncbut creates a synchronousanthropic.Anthropic()client and callsmessages.create()synchronously. This blocks the event loop during API calls. The Anthropic SDK providesanthropic.AsyncAnthropicfor async contexts. This will cause latency under concurrent GroupMe messages since the event loop is blocked during each Anthropic API call. Not a blocker for initial deployment (single-group, low concurrency), but should be tracked.Anthropic client instantiated per request (
app/ai.pyline ~299): A newanthropic.Anthropic()client is created on everyprocess_message()call. This means a new HTTP connection pool per request. Consider creating the client once at module level or as a cached dependency. Not a blocker but a performance concern.BasketballClient also instantiated per request (
app/ai.pyline ~315): Same pattern -- new client per message. Thetry/finallywithawait basketball_client.close()is correct for cleanup, but the client could be reused.No rate limiting on Anthropic API calls: A burst of GroupMe messages could exhaust API quota. Acceptable for initial deployment but worth noting for #7 (go-live).
_execute_read_toolexception logging leaks internal details (app/ai.py): Theexcept Exceptionblock returnsf"Error: {exc}"which gets fed back to Claude as tool_result content. Claude could then relay internal exception messages to GroupMe users. The system prompt says "never expose internal tool names or API details" but exception strings could contain URLs, stack info, etc. Consider returning a generic error to Claude and logging the detail server-side only.PEP 484 type hints: Good coverage overall.
_handle_responseusesAnyfor the response type which is acceptable given the SDK's return type. Thedict | strunion return type onprocess_messageis clear.Hardcoded tenant slug: The WebFetch summary of
basketball.pymentions a hardcoded tenant slug "westside-kings-queens". This is pre-existing (from #5), not introduced in this PR -- no action needed here.Test quality (38 new tests):
_clear_historyautouse fixture).Missing test coverage for groupme.py changes:
test_groupme.pydoes not test the new empty-text guard (if not text.strip()) added in this PR.test_groupme.pydoes not test thatprocess_messageis actually called when a valid message arrives (it only tests HTTP 200 returns). Thetest_valid_message_parsedtest predates the AI integration and does not mock/verify theprocess_messagecall.test_ai.pyintegration tests, but the groupme.py modifications themselves lack direct test coverage for the new behavior.BLOCKERS
None. All blocker criteria pass:
app/ai.pymodule has thorough test coverage.NITS
Sync client in async context: As noted above,
anthropic.Anthropicis synchronous. Switching toanthropic.AsyncAnthropicwithawait client.messages.create()would avoid blocking the event loop. Low urgency for single-group usage but should be a follow-up ticket.Per-request client instantiation: Both
Anthropic()andBasketballClient()are created fresh per message. A module-level or cached client would reduce connection overhead.Error message leakage to Claude:
_execute_read_toolreturns raw exception strings as tool results. Considerreturn "Error: could not retrieve data"with the full exception logged server-side only, to prevent internal details from reaching GroupMe via Claude's response.Missing tests for groupme.py changes: The empty-text guard and process_message wiring in
groupme.pylack direct unit tests. The existingtest_groupme.pyshould be updated to verify: (a) empty text returns early without callingprocess_message, (b) valid text callsprocess_message._truncateoff-by-one clarity: The implementationtext[: MAX_RESPONSE_LENGTH - 3] + "..."is correct but the constant relationship could be documented: the "..." consumes 3 of the 1000 chars, so content is 997 chars max. Tests confirm this works correctly.Dual history divergence risk: Simple history (
_conversation_history) and raw history (_raw_conversation_history) can drift since they are appended independently. For example, if the simple history evicts old messages but raw history still has them, the Anthropic API gets different context than what the deque cap suggests. This is acceptable for the current in-memory-only design but worth noting for any future persistence work.SOP COMPLIANCE
6-ai-engine-corefollows{issue-number}-{kebab-case-purpose}conventionfeat: AI engine core with Anthropic tool_use and GroupMe response postingproject-westside-ai-assistant). Issue #6 does reference it. Minor gap.PROCESS OBSERVATIONS
VERDICT: APPROVED