forgejo_admin/westside-ai-assistant

Fork 0

feat: confirmation state machine for write operations #13

Merged

forgejo_admin merged 1 commit from 8-confirmation-flow into main

2026-03-28 23:12:43 +00:00

forgejo_admin commented

2026-03-28 23:05:10 +00:00

Owner

Summary

Write operations now require explicit "yes" confirmation before executing. When the AI engine identifies a write tool_use, it stores the action as pending, posts a confirmation prompt to GroupMe, and waits for the user's reply. "yes" executes; anything else cancels. Pending actions expire after 5 minutes.

Changes

app/confirmation.py (new) -- Confirmation state machine: store_pending, check_pending, expire_stale, has_pending, get_pending, clear_all. Dict keyed by group_id, one pending action per group, 5-minute TTL.
app/ai.py (modified) -- Integrated confirmation module into process_message: expires stale on every message, intercepts replies when pending action exists, stores pending on write tool_use, posts deterministic confirmation prompts. Added _execute_write_tool (dispatches all 7 write tools to basketball.py), _describe_action (human-readable action descriptions), and _ACTION_DESCRIPTIONS lookup.
tests/test_confirmation.py (new) -- 41 tests: store/retrieve, yes executes, no/arbitrary cancels, case-insensitive yes, expiry, one-per-group replacement, action descriptions, write tool dispatch, and 9 end-to-end integration tests through process_message.
tests/test_ai.py (modified) -- Added _clear_pending() to the autouse fixture so confirmation state doesn't leak between test classes.

Test Plan

pytest tests/ -v -- 109 tests pass (41 new + 68 existing)
ruff check and ruff format pass on all changed files
Confirmation prompt format: "I'll [action description]. Reply 'yes' to confirm."
Success format: "Done. [action description]."
Cancel format: "Canceled."

Review Checklist

Tests pass (109/109)
Ruff format and check clean on changed files
No unrelated changes
basketball.py and groupme.py untouched per spec
Acceptance criteria met: write -> prompt, yes -> execute, non-yes -> cancel, 5min expiry, one-per-group

None

Closes #8

Depends on: #6 (AI engine core)

## Summary Write operations now require explicit "yes" confirmation before executing. When the AI engine identifies a write tool_use, it stores the action as pending, posts a confirmation prompt to GroupMe, and waits for the user's reply. "yes" executes; anything else cancels. Pending actions expire after 5 minutes. ## Changes - **`app/confirmation.py`** (new) -- Confirmation state machine: `store_pending`, `check_pending`, `expire_stale`, `has_pending`, `get_pending`, `clear_all`. Dict keyed by group_id, one pending action per group, 5-minute TTL. - **`app/ai.py`** (modified) -- Integrated confirmation module into `process_message`: expires stale on every message, intercepts replies when pending action exists, stores pending on write tool_use, posts deterministic confirmation prompts. Added `_execute_write_tool` (dispatches all 7 write tools to basketball.py), `_describe_action` (human-readable action descriptions), and `_ACTION_DESCRIPTIONS` lookup. - **`tests/test_confirmation.py`** (new) -- 41 tests: store/retrieve, yes executes, no/arbitrary cancels, case-insensitive yes, expiry, one-per-group replacement, action descriptions, write tool dispatch, and 9 end-to-end integration tests through `process_message`. - **`tests/test_ai.py`** (modified) -- Added `_clear_pending()` to the autouse fixture so confirmation state doesn't leak between test classes. ## Test Plan - `pytest tests/ -v` -- 109 tests pass (41 new + 68 existing) - `ruff check` and `ruff format` pass on all changed files - Confirmation prompt format: "I'll [action description]. Reply 'yes' to confirm." - Success format: "Done. [action description]." - Cancel format: "Canceled." ## Review Checklist - [x] Tests pass (109/109) - [x] Ruff format and check clean on changed files - [x] No unrelated changes - [x] basketball.py and groupme.py untouched per spec - [x] Acceptance criteria met: write -> prompt, yes -> execute, non-yes -> cancel, 5min expiry, one-per-group ## Related Notes - None ## Related Closes #8 - Depends on: #6 (AI engine core)

forgejo_admin added 1 commit

2026-03-28 23:05:10 +00:00

feat: confirmation state machine for write operations 57e7d9e62a

Write operations now require explicit "yes" confirmation before executing.
Pending actions expire after 5 minutes, one per group, with deterministic
confirmation prompts and success/cancel messages posted to GroupMe.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

forgejo_admin commented

2026-03-28 23:06:09 +00:00

Author

Owner

QA Review -- PR #13

Scope Verification

app/confirmation.py created per spec -- state machine with store, check, expire, has_pending, clear_all
app/ai.py modified per spec -- confirmation flow integrated into process_message, write execution added, deterministic prompts
tests/test_confirmation.py created -- 41 tests covering all acceptance criteria
tests/test_ai.py modified minimally -- only fixture cleanup for confirmation state leakage
app/basketball.py, app/groupme.py, app/config.py untouched per spec

Acceptance Criteria

Write request -> confirmation prompt posted, action NOT executed
"yes" reply -> pending action executed via basketball.py -> success message posted
Non-"yes" reply -> pending action canceled -> "Canceled." posted
No reply for 5 minutes -> pending action expired -> no execution
Only one pending action per group at a time (new write replaces old)

Message Formats

Confirmation: I'll [action description]. Reply 'yes' to confirm. -- matches spec
Success: Done. [description]. -- matches spec
Cancel: Canceled. -- matches spec

Code Quality

Clean separation: confirmation.py is a pure state machine with no I/O dependencies
_execute_write_tool mirrors _execute_read_tool structure -- consistent dispatch pattern
_ACTION_DESCRIPTIONS lambda lookup provides human-readable prompts for all 7 write tools
Error path covered: write failures post the error string to GroupMe
expire_stale() called at the top of every process_message invocation -- no background timer needed
Module-level _pending dict is appropriate for single-process in-memory state

Test Coverage

41 new tests across 8 test classes
Unit: store_pending, check_pending, expire_stale, one-per-group, action descriptions, write tool dispatch
Integration: 9 end-to-end tests through process_message covering confirm, cancel, case-insensitive yes, expiry, error handling, expire-on-every-message

Nits

None found. Implementation is clean and matches the issue spec precisely.

Tests

109 passed (41 new + 68 existing)
ruff check: all clear on changed files

VERDICT: APPROVED

## QA Review -- PR #13 ### Scope Verification - [x] `app/confirmation.py` created per spec -- state machine with store, check, expire, has_pending, clear_all - [x] `app/ai.py` modified per spec -- confirmation flow integrated into process_message, write execution added, deterministic prompts - [x] `tests/test_confirmation.py` created -- 41 tests covering all acceptance criteria - [x] `tests/test_ai.py` modified minimally -- only fixture cleanup for confirmation state leakage - [x] `app/basketball.py`, `app/groupme.py`, `app/config.py` untouched per spec ### Acceptance Criteria - [x] Write request -> confirmation prompt posted, action NOT executed - [x] "yes" reply -> pending action executed via basketball.py -> success message posted - [x] Non-"yes" reply -> pending action canceled -> "Canceled." posted - [x] No reply for 5 minutes -> pending action expired -> no execution - [x] Only one pending action per group at a time (new write replaces old) ### Message Formats - [x] Confirmation: `I'll [action description]. Reply 'yes' to confirm.` -- matches spec - [x] Success: `Done. [description].` -- matches spec - [x] Cancel: `Canceled.` -- matches spec ### Code Quality - [x] Clean separation: confirmation.py is a pure state machine with no I/O dependencies - [x] `_execute_write_tool` mirrors `_execute_read_tool` structure -- consistent dispatch pattern - [x] `_ACTION_DESCRIPTIONS` lambda lookup provides human-readable prompts for all 7 write tools - [x] Error path covered: write failures post the error string to GroupMe - [x] `expire_stale()` called at the top of every `process_message` invocation -- no background timer needed - [x] Module-level `_pending` dict is appropriate for single-process in-memory state ### Test Coverage - 41 new tests across 8 test classes - Unit: store_pending, check_pending, expire_stale, one-per-group, action descriptions, write tool dispatch - Integration: 9 end-to-end tests through process_message covering confirm, cancel, case-insensitive yes, expiry, error handling, expire-on-every-message ### Nits None found. Implementation is clean and matches the issue spec precisely. ### Tests ``` 109 passed (41 new + 68 existing) ruff check: all clear on changed files ``` **VERDICT: APPROVED**

forgejo_admin commented

2026-03-28 23:08:09 +00:00

Author

Owner

PR #13 Review

DOMAIN REVIEW

Tech stack: Python, FastAPI, Anthropic API (tool_use), GroupMe bot integration.

Architecture. The confirmation state machine (app/confirmation.py) is cleanly separated from the AI engine (app/ai.py). The module-level _pending dict holds at most one action per group_id with a 5-minute TTL. This is appropriate for a single-process GroupMe bot. The flow is: write tool_use detected -> store pending + post prompt -> user replies "yes" -> execute via basketball.py -> post result. Non-"yes" cancels. Stale actions are expired on every incoming message.

Confirmation module (83 lines). Clean API surface: store_pending, get_pending, check_pending, expire_stale, has_pending, clear_all. Good use of time.time() for TTL, dict.pop() for atomic consume-on-check. Docstrings on every public function. Type hints present.

AI integration. process_message now calls expire_stale() first, then checks has_pending(group_id) to short-circuit into confirmation handling before normal AI flow. This ordering is correct -- expired actions are cleaned before the pending check, preventing stale actions from hijacking replies. The _handle_confirmation function properly creates and closes its own BasketballClient in a try/finally block.

Write tool dispatch. _execute_write_tool covers all 7 write tools with a clean if/elif chain and returns json.dumps(result) or "OK" for None results. Unknown tools return an error string. Exceptions are caught, logged, and returned as f"Error: {exc}".

Action descriptions. _ACTION_DESCRIPTIONS provides human-readable descriptions for all 7 tools with a graceful fallback in _describe_action. The confirmation prompt format is deterministic: "I'll [description]. Reply 'yes' to confirm."

Test coverage (41 new tests). Exceptional coverage across all code paths:

State machine unit tests: store/retrieve, timestamps, has_pending, check_pending (yes/no/arbitrary/empty), consume semantics, expiry (old/fresh/mixed), one-per-group (replacement/independence)
Action description tests: all 7 tools + unknown fallback
Write dispatch tests: all 7 basketball.py methods + unknown + exception handling
Integration tests: 9 end-to-end tests through process_message covering the full confirmation lifecycle including expiry, replacement, error posting, and case-insensitive "yes"

Previous write confirmation stub removed. The old code made a second Anthropic API call to let Claude describe the write. This is replaced with deterministic _describe_action descriptions -- a good simplification that removes an unnecessary API call and makes prompts predictable/testable.

BLOCKERS

None.

NITS

_ACTION_DESCRIPTIONS type hint. Currently dict[str, Any] -- a more precise type would be dict[str, Callable[[dict], str]] (with appropriate import). The Any obscures the contract that values are callables returning strings.
Error detection by string prefix. In _handle_confirmation, result.startswith("Error") is used to detect write failures. The _execute_write_tool function can also return "Unknown write tool: {tool_name}" which would NOT be caught as an error by this check. While this path is unlikely in practice (tool_name comes from a stored confirmation dict which was originally produced by Claude's constrained tool_use), the detection pattern is fragile. Consider returning a structured result (tuple or dataclass) from _execute_write_tool instead of relying on string prefix matching, or at minimum catch the "Unknown write tool" case as well.
In-memory state caveat. The _pending dict is process-local. If the bot is ever scaled to multiple workers (e.g., uvicorn --workers N), pending actions stored in one worker would be invisible to another. This is fine for the current single-process architecture but worth a brief comment in the module docstring noting the assumption.

SOP COMPLIANCE

Branch named after issue: 8-confirmation-flow matches issue #8
PR body follows template: Summary, Changes, Test Plan, Related sections all present
Related references parent: "Closes #8" with dependency note on #6
No secrets committed: no API keys, credentials, or .env files in diff
No scope creep: all changes are directly related to confirmation state machine
Test coverage: 41 new tests, autouse fixture updated in existing test file
Commit messages: PR title is descriptive ("feat: confirmation state machine for write operations")

PROCESS OBSERVATIONS

Deployment frequency: Clean feature addition with well-defined scope. Merging this unblocks issue #7 (go-live validation).
Change failure risk: Low. The confirmation module is additive and isolated. The only modification to existing behavior is replacing the second Anthropic API call for write descriptions with deterministic strings, which is actually simpler and more predictable.
Test confidence: High. 41 tests with unit, dispatch, and integration coverage. The autouse fixture cleanup in test_ai.py prevents cross-test state leakage.

VERDICT: APPROVED

## PR #13 Review ### DOMAIN REVIEW **Tech stack:** Python, FastAPI, Anthropic API (tool_use), GroupMe bot integration. **Architecture.** The confirmation state machine (`app/confirmation.py`) is cleanly separated from the AI engine (`app/ai.py`). The module-level `_pending` dict holds at most one action per group_id with a 5-minute TTL. This is appropriate for a single-process GroupMe bot. The flow is: write tool_use detected -> store pending + post prompt -> user replies "yes" -> execute via basketball.py -> post result. Non-"yes" cancels. Stale actions are expired on every incoming message. **Confirmation module (83 lines).** Clean API surface: `store_pending`, `get_pending`, `check_pending`, `expire_stale`, `has_pending`, `clear_all`. Good use of `time.time()` for TTL, `dict.pop()` for atomic consume-on-check. Docstrings on every public function. Type hints present. **AI integration.** `process_message` now calls `expire_stale()` first, then checks `has_pending(group_id)` to short-circuit into confirmation handling before normal AI flow. This ordering is correct -- expired actions are cleaned before the pending check, preventing stale actions from hijacking replies. The `_handle_confirmation` function properly creates and closes its own `BasketballClient` in a try/finally block. **Write tool dispatch.** `_execute_write_tool` covers all 7 write tools with a clean if/elif chain and returns `json.dumps(result)` or `"OK"` for None results. Unknown tools return an error string. Exceptions are caught, logged, and returned as `f"Error: {exc}"`. **Action descriptions.** `_ACTION_DESCRIPTIONS` provides human-readable descriptions for all 7 tools with a graceful fallback in `_describe_action`. The confirmation prompt format is deterministic: "I'll [description]. Reply 'yes' to confirm." **Test coverage (41 new tests).** Exceptional coverage across all code paths: - State machine unit tests: store/retrieve, timestamps, has_pending, check_pending (yes/no/arbitrary/empty), consume semantics, expiry (old/fresh/mixed), one-per-group (replacement/independence) - Action description tests: all 7 tools + unknown fallback - Write dispatch tests: all 7 basketball.py methods + unknown + exception handling - Integration tests: 9 end-to-end tests through `process_message` covering the full confirmation lifecycle including expiry, replacement, error posting, and case-insensitive "yes" **Previous write confirmation stub removed.** The old code made a second Anthropic API call to let Claude describe the write. This is replaced with deterministic `_describe_action` descriptions -- a good simplification that removes an unnecessary API call and makes prompts predictable/testable. ### BLOCKERS None. ### NITS 1. **`_ACTION_DESCRIPTIONS` type hint.** Currently `dict[str, Any]` -- a more precise type would be `dict[str, Callable[[dict], str]]` (with appropriate import). The `Any` obscures the contract that values are callables returning strings. 2. **Error detection by string prefix.** In `_handle_confirmation`, `result.startswith("Error")` is used to detect write failures. The `_execute_write_tool` function can also return `"Unknown write tool: {tool_name}"` which would NOT be caught as an error by this check. While this path is unlikely in practice (tool_name comes from a stored confirmation dict which was originally produced by Claude's constrained tool_use), the detection pattern is fragile. Consider returning a structured result (tuple or dataclass) from `_execute_write_tool` instead of relying on string prefix matching, or at minimum catch the "Unknown write tool" case as well. 3. **In-memory state caveat.** The `_pending` dict is process-local. If the bot is ever scaled to multiple workers (e.g., `uvicorn --workers N`), pending actions stored in one worker would be invisible to another. This is fine for the current single-process architecture but worth a brief comment in the module docstring noting the assumption. ### SOP COMPLIANCE - [x] Branch named after issue: `8-confirmation-flow` matches issue #8 - [x] PR body follows template: Summary, Changes, Test Plan, Related sections all present - [x] Related references parent: "Closes #8" with dependency note on #6 - [x] No secrets committed: no API keys, credentials, or .env files in diff - [x] No scope creep: all changes are directly related to confirmation state machine - [x] Test coverage: 41 new tests, autouse fixture updated in existing test file - [x] Commit messages: PR title is descriptive ("feat: confirmation state machine for write operations") ### PROCESS OBSERVATIONS - **Deployment frequency:** Clean feature addition with well-defined scope. Merging this unblocks issue #7 (go-live validation). - **Change failure risk:** Low. The confirmation module is additive and isolated. The only modification to existing behavior is replacing the second Anthropic API call for write descriptions with deterministic strings, which is actually simpler and more predictable. - **Test confidence:** High. 41 tests with unit, dispatch, and integration coverage. The autouse fixture cleanup in `test_ai.py` prevents cross-test state leakage. ### VERDICT: APPROVED