fix: XML builder escaping + streaming error handling + URL encoding #4

Merged
forgejo_admin merged 2 commits from 3-fix-xml-injection-url-encoding-streaming-errors into main 2026-03-22 06:06:43 +00:00

Summary

Fixes three bugs found during QA review of PR #2: XML injection via unescaped f-strings in builder functions, silent 4xx/5xx errors on streaming GET responses, and broken query strings when params contain spaces or special characters.

Changes

  • src/minio_sdk/xml_parser.py: Added xml.sax.saxutils.escape() to all five build_* functions for every user-provided string value
  • src/minio_sdk/client.py: Added error checking for streaming responses in _request() -- reads body and raises S3Error on 4xx/5xx before returning the stream
  • src/minio_sdk/client.py: Replaced manual query string joining with urllib.parse.urlencode() in both _request() and _request_unsigned()
  • tests/test_xml_parser.py: 17 new unit tests validating XML escaping for all builder functions with special characters
  • tests/test_client_unit.py: 7 new unit tests for streaming error handling and URL encoding using mocks

Test Plan

  • Tests pass locally -- all 56 unit tests pass
  • ruff check and ruff format clean
  • No regressions -- existing signer tests unchanged and passing
  • Manual verification with special-char object keys against live MinIO

Review Checklist

  • Passed automated review-fix loop
  • No secrets committed
  • No unnecessary file changes
  • Commit messages are descriptive
  • Closes #3
  • Plan: plan-pal-e-platform (Phase 24 QA nits)
## Summary Fixes three bugs found during QA review of PR #2: XML injection via unescaped f-strings in builder functions, silent 4xx/5xx errors on streaming GET responses, and broken query strings when params contain spaces or special characters. ## Changes - `src/minio_sdk/xml_parser.py`: Added `xml.sax.saxutils.escape()` to all five `build_*` functions for every user-provided string value - `src/minio_sdk/client.py`: Added error checking for streaming responses in `_request()` -- reads body and raises `S3Error` on 4xx/5xx before returning the stream - `src/minio_sdk/client.py`: Replaced manual query string joining with `urllib.parse.urlencode()` in both `_request()` and `_request_unsigned()` - `tests/test_xml_parser.py`: 17 new unit tests validating XML escaping for all builder functions with special characters - `tests/test_client_unit.py`: 7 new unit tests for streaming error handling and URL encoding using mocks ## Test Plan - [x] Tests pass locally -- all 56 unit tests pass - [x] ruff check and ruff format clean - [x] No regressions -- existing signer tests unchanged and passing - [ ] Manual verification with special-char object keys against live MinIO ## Review Checklist - [x] Passed automated review-fix loop - [x] No secrets committed - [x] No unnecessary file changes - [x] Commit messages are descriptive ## Related - Closes #3 - Plan: `plan-pal-e-platform` (Phase 24 QA nits)
Three bugs found during QA of PR #2:

1. XML builder functions used raw f-string interpolation for user values.
   Object keys containing <, >, or & would produce malformed XML. Now all
   build_* functions use xml.sax.saxutils.escape() for user-provided values.

2. Streaming GET responses (get_object_stream) skipped error handling -- 404
   and 403 responses were returned silently. Now _request checks status code
   on streaming responses and raises S3Error before returning the stream.

3. Query string construction used raw string joining without URL-encoding.
   Prefixes with spaces or special chars would break. Now uses
   urllib.parse.urlencode with quote_via=urllib.parse.quote.

Closes #3

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Author
Owner

QA Review -- PR #4

XML Escaping (xml_parser.py)

All five build_* functions now use xml.sax.saxutils.escape() for user-provided string values. This is the correct stdlib approach -- it escapes <, >, and &. Quotes inside element text content are valid XML and do not need escaping (only required in attribute values). Integer values like part_number, expiration_days, and noncurrent_version_expiration_days are correctly left unescaped since they are not user-provided strings.

No issues found.

Streaming Error Handling (client.py)

The _request method now checks resp.status_code >= 400 on streaming responses, reads the error body via resp.text, calls resp.close(), then raises via raise_for_error_response. The ordering is correct -- body must be read before close. The close prevents resource leaks on error paths.

No issues found.

URL Encoding (client.py)

Replaced raw "&".join(f"{k}={v}" ...) with urllib.parse.urlencode(sorted(...), quote_via=urllib.parse.quote). Using quote_via=urllib.parse.quote produces %20 for spaces (not +), which is correct for S3 canonical query strings.

Verified no double-encoding risk: sign_request re-parses the URL with urllib.parse.parse_qsl (which decodes), then re-encodes via _canonical_query_string. The round-trip is safe.

No issues found.

Test Coverage

  • 17 XML escaping tests across all 5 builder functions, each validated with ET.fromstring() to confirm output is parseable XML
  • 7 client unit tests using mocks for streaming errors (404, 403, 200, 500) and URL encoding (spaces, ampersands, empty values)
  • All 56 unit tests pass, ruff clean

Nits

None.


VERDICT: APPROVE

## QA Review -- PR #4 ### XML Escaping (xml_parser.py) All five `build_*` functions now use `xml.sax.saxutils.escape()` for user-provided string values. This is the correct stdlib approach -- it escapes `<`, `>`, and `&`. Quotes inside element text content are valid XML and do not need escaping (only required in attribute values). Integer values like `part_number`, `expiration_days`, and `noncurrent_version_expiration_days` are correctly left unescaped since they are not user-provided strings. No issues found. ### Streaming Error Handling (client.py) The `_request` method now checks `resp.status_code >= 400` on streaming responses, reads the error body via `resp.text`, calls `resp.close()`, then raises via `raise_for_error_response`. The ordering is correct -- body must be read before close. The close prevents resource leaks on error paths. No issues found. ### URL Encoding (client.py) Replaced raw `"&".join(f"{k}={v}" ...)` with `urllib.parse.urlencode(sorted(...), quote_via=urllib.parse.quote)`. Using `quote_via=urllib.parse.quote` produces `%20` for spaces (not `+`), which is correct for S3 canonical query strings. Verified no double-encoding risk: `sign_request` re-parses the URL with `urllib.parse.parse_qsl` (which decodes), then re-encodes via `_canonical_query_string`. The round-trip is safe. No issues found. ### Test Coverage - 17 XML escaping tests across all 5 builder functions, each validated with `ET.fromstring()` to confirm output is parseable XML - 7 client unit tests using mocks for streaming errors (404, 403, 200, 500) and URL encoding (spaces, ampersands, empty values) - All 56 unit tests pass, ruff clean ### Nits None. --- **VERDICT: APPROVE**
Author
Owner

PR #4 Review

DOMAIN REVIEW

Tech stack: Python 3.12, stdlib xml, urllib.parse, requests, pytest, ruff. No FastAPI/SQLAlchemy. Domain: custom S3 SDK library.

Fix 1 -- XML builder f-string injection (xml_parser.py):
All five build_* functions now wrap every user-provided string value in _xml_escape() from xml.sax.saxutils. This is the correct stdlib approach -- it escapes <, >, and & by default, which covers element text injection. Integer fields (expiration_days, noncurrent_version_expiration_days, part_number) are correctly left unescaped since they are typed int | None and cannot carry injection payloads. The import alias _xml_escape follows the underscore-prefix convention for module-internal names. Complete.

Fix 2 -- Streaming GET error handling (client.py lines 110-118):
The _request() method now checks resp.status_code >= 400 on streaming responses before returning the stream. On error, it reads resp.text to extract the XML error body, calls resp.close() to release the connection, then delegates to raise_for_error_response(). The _request_unsigned() method does not need this treatment because it never uses stream=True (it is only used for streaming uploads where the response body is always small and fully read). Complete.

Fix 3 -- URL encoding (client.py lines 78, 134):
Both _request() and _request_unsigned() now use urllib.parse.urlencode(sorted(params.items()), quote_via=urllib.parse.quote) instead of raw f-string {k}={v} joining. The quote_via=urllib.parse.quote choice encodes spaces as %20 (not +), which is correct for S3 query strings per the AWS Signature V4 spec. The sorted() call matches the previous behavior and is required for deterministic signing. Complete.

Test coverage:

  • tests/test_xml_parser.py: 17 tests covering all five builder functions with &, <, >, ", combined special chars, and regression tests for normal values. All tests validate via ET.fromstring() that output is parseable XML.
  • tests/test_client_unit.py: 7 tests -- 4 for streaming error handling (404/NoSuchKey, 403/AccessDenied, 500, 200 success) and 3 for URL encoding (spaces, ampersands, empty-value params like ?versioning=).
  • Tests use unittest.mock.patch to avoid network dependencies. Appropriate for unit tests.

BLOCKERS

CI does not run the new tests. The Woodpecker pipeline (.woodpecker.yml line 16) hardcodes pytest tests/test_signer.py -v. The two new test files (tests/test_xml_parser.py, tests/test_client_unit.py) will not execute in CI. This means the 24 new tests have zero CI enforcement -- regressions can merge undetected. The test step should run pytest tests/ -v (or at minimum include the new test files explicitly).

This is a blocker because new functionality with tests that cannot run in CI is effectively untested in the pipeline. The tests exist, but CI will never gate on them.

NITS

  1. test_stream_500_raises uses bare Exception catch (tests/test_client_unit.py line ~76). The # noqa: B017 suppression is noted, but the test would be stronger if it asserted ServerError specifically (the raise_for_error_response function raises ServerError for 5xx codes without a recognized XML error code). This would also remove the need for the noqa.

  2. test_prefix_with_spaces_is_encoded accepts both %20 and + (line ~89: assert "my%20folder" in url or "my+folder" in url). Since the code explicitly uses quote_via=urllib.parse.quote (which produces %20, not +), the test should assert only %20 to validate the intended behavior precisely.

  3. test_quotes_in_value comment says escape does not escape quotes by default (tests/test_xml_parser.py line ~41). This is accurate for element text content (quotes are valid in XML text nodes), but worth noting that if these values ever appear in XML attributes, quoteattr() would be needed instead. Not an issue with the current f-string element construction, just a design awareness note.

  4. Integer fields in build_lifecycle_xml are interpolated without int() cast (lines 305, 309). The type annotations guarantee int, but a defensive int(rule.expiration_days) would prevent injection if the model ever accepts string input. Very low risk given the dataclass typing.

SOP COMPLIANCE

  • Branch named after issue (3-fix-xml-injection-url-encoding-streaming-errors references issue #3)
  • PR body has Summary, Changes, Test Plan, Related sections
  • Related references plan slug (plan-pal-e-platform)
  • No secrets committed
  • No unnecessary file changes (4 files changed, all directly related to the 3 fixes)
  • Commit messages are descriptive
  • CI runs all tests -- .woodpecker.yml only runs test_signer.py, not the new test files

PROCESS OBSERVATIONS

  • Change Failure Risk: The three fixes are well-scoped and the test coverage is thorough in structure. However, CI not running the new tests creates a gap where future regressions could merge without detection.
  • Deployment Frequency: This is a clean bug-fix PR that addresses all three QA findings from PR #2 in a single pass. Good batch discipline.
  • Documentation: The PR body is well-structured and the code changes include clear inline comments explaining the streaming error handling logic.

VERDICT: NOT APPROVED

The single blocker is the CI pipeline not running the new test files. Update .woodpecker.yml to run pytest tests/ -v (or explicitly include tests/test_xml_parser.py and tests/test_client_unit.py), then this PR is ready to merge. All three requested fixes are complete and correctly implemented.

## PR #4 Review ### DOMAIN REVIEW **Tech stack:** Python 3.12, stdlib `xml`, `urllib.parse`, `requests`, `pytest`, `ruff`. No FastAPI/SQLAlchemy. Domain: custom S3 SDK library. **Fix 1 -- XML builder f-string injection (`xml_parser.py`):** All five `build_*` functions now wrap every user-provided string value in `_xml_escape()` from `xml.sax.saxutils`. This is the correct stdlib approach -- it escapes `<`, `>`, and `&` by default, which covers element text injection. Integer fields (`expiration_days`, `noncurrent_version_expiration_days`, `part_number`) are correctly left unescaped since they are typed `int | None` and cannot carry injection payloads. The import alias `_xml_escape` follows the underscore-prefix convention for module-internal names. **Complete.** **Fix 2 -- Streaming GET error handling (`client.py` lines 110-118):** The `_request()` method now checks `resp.status_code >= 400` on streaming responses before returning the stream. On error, it reads `resp.text` to extract the XML error body, calls `resp.close()` to release the connection, then delegates to `raise_for_error_response()`. The `_request_unsigned()` method does not need this treatment because it never uses `stream=True` (it is only used for streaming uploads where the response body is always small and fully read). **Complete.** **Fix 3 -- URL encoding (`client.py` lines 78, 134):** Both `_request()` and `_request_unsigned()` now use `urllib.parse.urlencode(sorted(params.items()), quote_via=urllib.parse.quote)` instead of raw f-string `{k}={v}` joining. The `quote_via=urllib.parse.quote` choice encodes spaces as `%20` (not `+`), which is correct for S3 query strings per the AWS Signature V4 spec. The `sorted()` call matches the previous behavior and is required for deterministic signing. **Complete.** **Test coverage:** - `tests/test_xml_parser.py`: 17 tests covering all five builder functions with `&`, `<`, `>`, `"`, combined special chars, and regression tests for normal values. All tests validate via `ET.fromstring()` that output is parseable XML. - `tests/test_client_unit.py`: 7 tests -- 4 for streaming error handling (404/NoSuchKey, 403/AccessDenied, 500, 200 success) and 3 for URL encoding (spaces, ampersands, empty-value params like `?versioning=`). - Tests use `unittest.mock.patch` to avoid network dependencies. Appropriate for unit tests. ### BLOCKERS **CI does not run the new tests.** The Woodpecker pipeline (`.woodpecker.yml` line 16) hardcodes `pytest tests/test_signer.py -v`. The two new test files (`tests/test_xml_parser.py`, `tests/test_client_unit.py`) will not execute in CI. This means the 24 new tests have zero CI enforcement -- regressions can merge undetected. The `test` step should run `pytest tests/ -v` (or at minimum include the new test files explicitly). This is a blocker because new functionality with tests that cannot run in CI is effectively untested in the pipeline. The tests exist, but CI will never gate on them. ### NITS 1. **`test_stream_500_raises` uses bare `Exception` catch** (`tests/test_client_unit.py` line ~76). The `# noqa: B017` suppression is noted, but the test would be stronger if it asserted `ServerError` specifically (the `raise_for_error_response` function raises `ServerError` for 5xx codes without a recognized XML error code). This would also remove the need for the noqa. 2. **`test_prefix_with_spaces_is_encoded` accepts both `%20` and `+`** (line ~89: `assert "my%20folder" in url or "my+folder" in url`). Since the code explicitly uses `quote_via=urllib.parse.quote` (which produces `%20`, not `+`), the test should assert only `%20` to validate the intended behavior precisely. 3. **`test_quotes_in_value` comment says `escape` does not escape quotes by default** (`tests/test_xml_parser.py` line ~41). This is accurate for element text content (quotes are valid in XML text nodes), but worth noting that if these values ever appear in XML attributes, `quoteattr()` would be needed instead. Not an issue with the current f-string element construction, just a design awareness note. 4. **Integer fields in `build_lifecycle_xml` are interpolated without `int()` cast** (lines 305, 309). The type annotations guarantee `int`, but a defensive `int(rule.expiration_days)` would prevent injection if the model ever accepts string input. Very low risk given the dataclass typing. ### SOP COMPLIANCE - [x] Branch named after issue (`3-fix-xml-injection-url-encoding-streaming-errors` references issue #3) - [x] PR body has Summary, Changes, Test Plan, Related sections - [x] Related references plan slug (`plan-pal-e-platform`) - [x] No secrets committed - [x] No unnecessary file changes (4 files changed, all directly related to the 3 fixes) - [x] Commit messages are descriptive - [ ] **CI runs all tests** -- `.woodpecker.yml` only runs `test_signer.py`, not the new test files ### PROCESS OBSERVATIONS - **Change Failure Risk:** The three fixes are well-scoped and the test coverage is thorough in structure. However, CI not running the new tests creates a gap where future regressions could merge without detection. - **Deployment Frequency:** This is a clean bug-fix PR that addresses all three QA findings from PR #2 in a single pass. Good batch discipline. - **Documentation:** The PR body is well-structured and the code changes include clear inline comments explaining the streaming error handling logic. ### VERDICT: NOT APPROVED The single blocker is the CI pipeline not running the new test files. Update `.woodpecker.yml` to run `pytest tests/ -v` (or explicitly include `tests/test_xml_parser.py` and `tests/test_client_unit.py`), then this PR is ready to merge. All three requested fixes are complete and correctly implemented.
The CI test step hardcoded `pytest tests/test_signer.py -v`, which
meant the 24 new tests in test_xml_parser.py and test_client_unit.py
were never executed in CI. Changed to `pytest tests/ -v` for full
test discovery.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Author
Owner

Fixed the QA blocker: .woodpecker.yml line 16 changed from pytest tests/test_signer.py -v to pytest tests/ -v.

This ensures all 56 tests (32 signer + 17 xml_parser + 7 client_unit) are discovered and executed in CI, not just the original 32 signer tests.

Fixed the QA blocker: `.woodpecker.yml` line 16 changed from `pytest tests/test_signer.py -v` to `pytest tests/ -v`. This ensures all 56 tests (32 signer + 17 xml_parser + 7 client_unit) are discovered and executed in CI, not just the original 32 signer tests.
Author
Owner

PR #4 Re-Review

This is a re-review focusing on the CI blocker identified in the first review. The three original code fixes (XML escaping, streaming error handling, URL encoding) were already approved and are unchanged.

DOMAIN REVIEW

Tech stack: Python 3.12, pytest, Woodpecker CI.

CI fix verified: .woodpecker.yml line 16 changed from:

- pytest tests/test_signer.py -v

to:

- pytest tests/ -v

This is the correct fix. pytest tests/ -v will discover and run all test modules in the tests/ directory, including:

  • test_signer.py (existing, 32 tests)
  • test_xml_parser.py (new, 17 tests)
  • test_client_unit.py (new, 7 tests)

Total: 56 tests as stated in the PR description.

Note: The Woodpecker CI repo returned 404 when querying pipeline status, so I could not independently verify a green pipeline run. The .woodpecker.yml change is correct by inspection -- pytest tests/ -v is standard pytest directory discovery and will run all test_*.py files.

BLOCKERS

None. The previous blocker (hardcoded test file path causing 24 new tests to never run in CI) is resolved.

NITS

None. No changes beyond the single-line CI fix.

SOP COMPLIANCE

  • Branch named after issue (3-fix-xml-injection-url-encoding-streaming-errors references issue #3)
  • PR body follows template (Summary, Changes, Test Plan, Related sections present)
  • Related references plan slug (plan-pal-e-platform)
  • No secrets committed
  • No unnecessary file changes (single-line fix, tightly scoped)
  • Commit messages are descriptive

PROCESS OBSERVATIONS

The review-fix loop worked as intended: first review caught a real CI coverage gap (24 tests silently skipped), the fix was applied, and re-review confirms resolution. This is exactly how the loop should function -- catching issues that would otherwise reduce Change Failure Rate confidence.

The Woodpecker 404 suggests the repo may not be activated for CI yet. If CI is not running on this repo, the .woodpecker.yml is inert until activation. This is not a blocker for the PR itself but worth noting for operational awareness.

VERDICT: APPROVED

## PR #4 Re-Review This is a re-review focusing on the CI blocker identified in the first review. The three original code fixes (XML escaping, streaming error handling, URL encoding) were already approved and are unchanged. ### DOMAIN REVIEW **Tech stack**: Python 3.12, pytest, Woodpecker CI. **CI fix verified**: `.woodpecker.yml` line 16 changed from: ```yaml - pytest tests/test_signer.py -v ``` to: ```yaml - pytest tests/ -v ``` This is the correct fix. `pytest tests/ -v` will discover and run all test modules in the `tests/` directory, including: - `test_signer.py` (existing, 32 tests) - `test_xml_parser.py` (new, 17 tests) - `test_client_unit.py` (new, 7 tests) Total: 56 tests as stated in the PR description. **Note**: The Woodpecker CI repo returned 404 when querying pipeline status, so I could not independently verify a green pipeline run. The `.woodpecker.yml` change is correct by inspection -- `pytest tests/ -v` is standard pytest directory discovery and will run all `test_*.py` files. ### BLOCKERS None. The previous blocker (hardcoded test file path causing 24 new tests to never run in CI) is resolved. ### NITS None. No changes beyond the single-line CI fix. ### SOP COMPLIANCE - [x] Branch named after issue (`3-fix-xml-injection-url-encoding-streaming-errors` references issue #3) - [x] PR body follows template (Summary, Changes, Test Plan, Related sections present) - [x] Related references plan slug (`plan-pal-e-platform`) - [x] No secrets committed - [x] No unnecessary file changes (single-line fix, tightly scoped) - [x] Commit messages are descriptive ### PROCESS OBSERVATIONS The review-fix loop worked as intended: first review caught a real CI coverage gap (24 tests silently skipped), the fix was applied, and re-review confirms resolution. This is exactly how the loop should function -- catching issues that would otherwise reduce Change Failure Rate confidence. The Woodpecker 404 suggests the repo may not be activated for CI yet. If CI is not running on this repo, the `.woodpecker.yml` is inert until activation. This is not a blocker for the PR itself but worth noting for operational awareness. ### VERDICT: APPROVED
forgejo_admin deleted branch 3-fix-xml-injection-url-encoding-streaming-errors 2026-03-22 06:06:43 +00:00
Sign in to join this conversation.
No reviewers
No labels
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
forgejo_admin/minio-sdk!4
No description provided.