feat: confirmation state machine for write operations #13
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "8-confirmation-flow"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Summary
Write operations now require explicit "yes" confirmation before executing. When the AI engine identifies a write tool_use, it stores the action as pending, posts a confirmation prompt to GroupMe, and waits for the user's reply. "yes" executes; anything else cancels. Pending actions expire after 5 minutes.
Changes
app/confirmation.py(new) -- Confirmation state machine:store_pending,check_pending,expire_stale,has_pending,get_pending,clear_all. Dict keyed by group_id, one pending action per group, 5-minute TTL.app/ai.py(modified) -- Integrated confirmation module intoprocess_message: expires stale on every message, intercepts replies when pending action exists, stores pending on write tool_use, posts deterministic confirmation prompts. Added_execute_write_tool(dispatches all 7 write tools to basketball.py),_describe_action(human-readable action descriptions), and_ACTION_DESCRIPTIONSlookup.tests/test_confirmation.py(new) -- 41 tests: store/retrieve, yes executes, no/arbitrary cancels, case-insensitive yes, expiry, one-per-group replacement, action descriptions, write tool dispatch, and 9 end-to-end integration tests throughprocess_message.tests/test_ai.py(modified) -- Added_clear_pending()to the autouse fixture so confirmation state doesn't leak between test classes.Test Plan
pytest tests/ -v-- 109 tests pass (41 new + 68 existing)ruff checkandruff formatpass on all changed filesReview Checklist
Related Notes
Related
Closes #8
QA Review -- PR #13
Scope Verification
app/confirmation.pycreated per spec -- state machine with store, check, expire, has_pending, clear_allapp/ai.pymodified per spec -- confirmation flow integrated into process_message, write execution added, deterministic promptstests/test_confirmation.pycreated -- 41 tests covering all acceptance criteriatests/test_ai.pymodified minimally -- only fixture cleanup for confirmation state leakageapp/basketball.py,app/groupme.py,app/config.pyuntouched per specAcceptance Criteria
Message Formats
I'll [action description]. Reply 'yes' to confirm.-- matches specDone. [description].-- matches specCanceled.-- matches specCode Quality
_execute_write_toolmirrors_execute_read_toolstructure -- consistent dispatch pattern_ACTION_DESCRIPTIONSlambda lookup provides human-readable prompts for all 7 write toolsexpire_stale()called at the top of everyprocess_messageinvocation -- no background timer needed_pendingdict is appropriate for single-process in-memory stateTest Coverage
Nits
None found. Implementation is clean and matches the issue spec precisely.
Tests
VERDICT: APPROVED
PR #13 Review
DOMAIN REVIEW
Tech stack: Python, FastAPI, Anthropic API (tool_use), GroupMe bot integration.
Architecture. The confirmation state machine (
app/confirmation.py) is cleanly separated from the AI engine (app/ai.py). The module-level_pendingdict holds at most one action per group_id with a 5-minute TTL. This is appropriate for a single-process GroupMe bot. The flow is: write tool_use detected -> store pending + post prompt -> user replies "yes" -> execute via basketball.py -> post result. Non-"yes" cancels. Stale actions are expired on every incoming message.Confirmation module (83 lines). Clean API surface:
store_pending,get_pending,check_pending,expire_stale,has_pending,clear_all. Good use oftime.time()for TTL,dict.pop()for atomic consume-on-check. Docstrings on every public function. Type hints present.AI integration.
process_messagenow callsexpire_stale()first, then checkshas_pending(group_id)to short-circuit into confirmation handling before normal AI flow. This ordering is correct -- expired actions are cleaned before the pending check, preventing stale actions from hijacking replies. The_handle_confirmationfunction properly creates and closes its ownBasketballClientin a try/finally block.Write tool dispatch.
_execute_write_toolcovers all 7 write tools with a clean if/elif chain and returnsjson.dumps(result)or"OK"for None results. Unknown tools return an error string. Exceptions are caught, logged, and returned asf"Error: {exc}".Action descriptions.
_ACTION_DESCRIPTIONSprovides human-readable descriptions for all 7 tools with a graceful fallback in_describe_action. The confirmation prompt format is deterministic: "I'll [description]. Reply 'yes' to confirm."Test coverage (41 new tests). Exceptional coverage across all code paths:
process_messagecovering the full confirmation lifecycle including expiry, replacement, error posting, and case-insensitive "yes"Previous write confirmation stub removed. The old code made a second Anthropic API call to let Claude describe the write. This is replaced with deterministic
_describe_actiondescriptions -- a good simplification that removes an unnecessary API call and makes prompts predictable/testable.BLOCKERS
None.
NITS
_ACTION_DESCRIPTIONStype hint. Currentlydict[str, Any]-- a more precise type would bedict[str, Callable[[dict], str]](with appropriate import). TheAnyobscures the contract that values are callables returning strings.Error detection by string prefix. In
_handle_confirmation,result.startswith("Error")is used to detect write failures. The_execute_write_toolfunction can also return"Unknown write tool: {tool_name}"which would NOT be caught as an error by this check. While this path is unlikely in practice (tool_name comes from a stored confirmation dict which was originally produced by Claude's constrained tool_use), the detection pattern is fragile. Consider returning a structured result (tuple or dataclass) from_execute_write_toolinstead of relying on string prefix matching, or at minimum catch the "Unknown write tool" case as well.In-memory state caveat. The
_pendingdict is process-local. If the bot is ever scaled to multiple workers (e.g.,uvicorn --workers N), pending actions stored in one worker would be invisible to another. This is fine for the current single-process architecture but worth a brief comment in the module docstring noting the assumption.SOP COMPLIANCE
8-confirmation-flowmatches issue #8PROCESS OBSERVATIONS
test_ai.pyprevents cross-test state leakage.VERDICT: APPROVED