fix: XML builder escaping + streaming error handling + URL encoding #4
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "3-fix-xml-injection-url-encoding-streaming-errors"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Summary
Fixes three bugs found during QA review of PR #2: XML injection via unescaped f-strings in builder functions, silent 4xx/5xx errors on streaming GET responses, and broken query strings when params contain spaces or special characters.
Changes
src/minio_sdk/xml_parser.py: Addedxml.sax.saxutils.escape()to all fivebuild_*functions for every user-provided string valuesrc/minio_sdk/client.py: Added error checking for streaming responses in_request()-- reads body and raisesS3Erroron 4xx/5xx before returning the streamsrc/minio_sdk/client.py: Replaced manual query string joining withurllib.parse.urlencode()in both_request()and_request_unsigned()tests/test_xml_parser.py: 17 new unit tests validating XML escaping for all builder functions with special characterstests/test_client_unit.py: 7 new unit tests for streaming error handling and URL encoding using mocksTest Plan
Review Checklist
Related
plan-pal-e-platform(Phase 24 QA nits)QA Review -- PR #4
XML Escaping (xml_parser.py)
All five
build_*functions now usexml.sax.saxutils.escape()for user-provided string values. This is the correct stdlib approach -- it escapes<,>, and&. Quotes inside element text content are valid XML and do not need escaping (only required in attribute values). Integer values likepart_number,expiration_days, andnoncurrent_version_expiration_daysare correctly left unescaped since they are not user-provided strings.No issues found.
Streaming Error Handling (client.py)
The
_requestmethod now checksresp.status_code >= 400on streaming responses, reads the error body viaresp.text, callsresp.close(), then raises viaraise_for_error_response. The ordering is correct -- body must be read before close. The close prevents resource leaks on error paths.No issues found.
URL Encoding (client.py)
Replaced raw
"&".join(f"{k}={v}" ...)withurllib.parse.urlencode(sorted(...), quote_via=urllib.parse.quote). Usingquote_via=urllib.parse.quoteproduces%20for spaces (not+), which is correct for S3 canonical query strings.Verified no double-encoding risk:
sign_requestre-parses the URL withurllib.parse.parse_qsl(which decodes), then re-encodes via_canonical_query_string. The round-trip is safe.No issues found.
Test Coverage
ET.fromstring()to confirm output is parseable XMLNits
None.
VERDICT: APPROVE
PR #4 Review
DOMAIN REVIEW
Tech stack: Python 3.12, stdlib
xml,urllib.parse,requests,pytest,ruff. No FastAPI/SQLAlchemy. Domain: custom S3 SDK library.Fix 1 -- XML builder f-string injection (
xml_parser.py):All five
build_*functions now wrap every user-provided string value in_xml_escape()fromxml.sax.saxutils. This is the correct stdlib approach -- it escapes<,>, and&by default, which covers element text injection. Integer fields (expiration_days,noncurrent_version_expiration_days,part_number) are correctly left unescaped since they are typedint | Noneand cannot carry injection payloads. The import alias_xml_escapefollows the underscore-prefix convention for module-internal names. Complete.Fix 2 -- Streaming GET error handling (
client.pylines 110-118):The
_request()method now checksresp.status_code >= 400on streaming responses before returning the stream. On error, it readsresp.textto extract the XML error body, callsresp.close()to release the connection, then delegates toraise_for_error_response(). The_request_unsigned()method does not need this treatment because it never usesstream=True(it is only used for streaming uploads where the response body is always small and fully read). Complete.Fix 3 -- URL encoding (
client.pylines 78, 134):Both
_request()and_request_unsigned()now useurllib.parse.urlencode(sorted(params.items()), quote_via=urllib.parse.quote)instead of raw f-string{k}={v}joining. Thequote_via=urllib.parse.quotechoice encodes spaces as%20(not+), which is correct for S3 query strings per the AWS Signature V4 spec. Thesorted()call matches the previous behavior and is required for deterministic signing. Complete.Test coverage:
tests/test_xml_parser.py: 17 tests covering all five builder functions with&,<,>,", combined special chars, and regression tests for normal values. All tests validate viaET.fromstring()that output is parseable XML.tests/test_client_unit.py: 7 tests -- 4 for streaming error handling (404/NoSuchKey, 403/AccessDenied, 500, 200 success) and 3 for URL encoding (spaces, ampersands, empty-value params like?versioning=).unittest.mock.patchto avoid network dependencies. Appropriate for unit tests.BLOCKERS
CI does not run the new tests. The Woodpecker pipeline (
.woodpecker.ymlline 16) hardcodespytest tests/test_signer.py -v. The two new test files (tests/test_xml_parser.py,tests/test_client_unit.py) will not execute in CI. This means the 24 new tests have zero CI enforcement -- regressions can merge undetected. Theteststep should runpytest tests/ -v(or at minimum include the new test files explicitly).This is a blocker because new functionality with tests that cannot run in CI is effectively untested in the pipeline. The tests exist, but CI will never gate on them.
NITS
test_stream_500_raisesuses bareExceptioncatch (tests/test_client_unit.pyline ~76). The# noqa: B017suppression is noted, but the test would be stronger if it assertedServerErrorspecifically (theraise_for_error_responsefunction raisesServerErrorfor 5xx codes without a recognized XML error code). This would also remove the need for the noqa.test_prefix_with_spaces_is_encodedaccepts both%20and+(line ~89:assert "my%20folder" in url or "my+folder" in url). Since the code explicitly usesquote_via=urllib.parse.quote(which produces%20, not+), the test should assert only%20to validate the intended behavior precisely.test_quotes_in_valuecomment saysescapedoes not escape quotes by default (tests/test_xml_parser.pyline ~41). This is accurate for element text content (quotes are valid in XML text nodes), but worth noting that if these values ever appear in XML attributes,quoteattr()would be needed instead. Not an issue with the current f-string element construction, just a design awareness note.Integer fields in
build_lifecycle_xmlare interpolated withoutint()cast (lines 305, 309). The type annotations guaranteeint, but a defensiveint(rule.expiration_days)would prevent injection if the model ever accepts string input. Very low risk given the dataclass typing.SOP COMPLIANCE
3-fix-xml-injection-url-encoding-streaming-errorsreferences issue #3)plan-pal-e-platform).woodpecker.ymlonly runstest_signer.py, not the new test filesPROCESS OBSERVATIONS
VERDICT: NOT APPROVED
The single blocker is the CI pipeline not running the new test files. Update
.woodpecker.ymlto runpytest tests/ -v(or explicitly includetests/test_xml_parser.pyandtests/test_client_unit.py), then this PR is ready to merge. All three requested fixes are complete and correctly implemented.Fixed the QA blocker:
.woodpecker.ymlline 16 changed frompytest tests/test_signer.py -vtopytest tests/ -v.This ensures all 56 tests (32 signer + 17 xml_parser + 7 client_unit) are discovered and executed in CI, not just the original 32 signer tests.
PR #4 Re-Review
This is a re-review focusing on the CI blocker identified in the first review. The three original code fixes (XML escaping, streaming error handling, URL encoding) were already approved and are unchanged.
DOMAIN REVIEW
Tech stack: Python 3.12, pytest, Woodpecker CI.
CI fix verified:
.woodpecker.ymlline 16 changed from:to:
This is the correct fix.
pytest tests/ -vwill discover and run all test modules in thetests/directory, including:test_signer.py(existing, 32 tests)test_xml_parser.py(new, 17 tests)test_client_unit.py(new, 7 tests)Total: 56 tests as stated in the PR description.
Note: The Woodpecker CI repo returned 404 when querying pipeline status, so I could not independently verify a green pipeline run. The
.woodpecker.ymlchange is correct by inspection --pytest tests/ -vis standard pytest directory discovery and will run alltest_*.pyfiles.BLOCKERS
None. The previous blocker (hardcoded test file path causing 24 new tests to never run in CI) is resolved.
NITS
None. No changes beyond the single-line CI fix.
SOP COMPLIANCE
3-fix-xml-injection-url-encoding-streaming-errorsreferences issue #3)plan-pal-e-platform)PROCESS OBSERVATIONS
The review-fix loop worked as intended: first review caught a real CI coverage gap (24 tests silently skipped), the fix was applied, and re-review confirms resolution. This is exactly how the loop should function -- catching issues that would otherwise reduce Change Failure Rate confidence.
The Woodpecker 404 suggests the repo may not be activated for CI yet. If CI is not running on this repo, the
.woodpecker.ymlis inert until activation. This is not a blocker for the PR itself but worth noting for operational awareness.VERDICT: APPROVED