Skip to content

Escape XML special chars in _build_gdal_metadata_xml (#1614)#1617

Merged
brendancol merged 1 commit into
mainfrom
deep-sweep-security-geotiff-2026-05-11-c
May 11, 2026
Merged

Escape XML special chars in _build_gdal_metadata_xml (#1614)#1617
brendancol merged 1 commit into
mainfrom
deep-sweep-security-geotiff-2026-05-11-c

Conversation

@brendancol
Copy link
Copy Markdown
Contributor

Summary

  • Fix XML injection / silent metadata loss in _build_gdal_metadata_xml (xrspatial/geotiff/_geotags.py). The helper interpolated caller-supplied keys and values via plain f-strings, so & < > " ' in the metadata dict produced malformed XML that _parse_gdal_metadata swallowed as {} on read, and a crafted key like foo" malicious="bar injected attributes into the <Item> element.
  • Route every text slot through xml.sax.saxutils.escape and every attribute slot through quoteattr, mirroring the helpers added in _vrt.py by write_vrt: XML special characters in crs_wkt and source filenames are emitted unescaped #1607.
  • Add xrspatial/geotiff/tests/test_gdal_metadata_xml_escape_1614.py covering every XML special character, attribute injection, element injection, unicode pass-through, and an end-to-end to_geotiff round trip.

Found by the geotiff security sweep (Cat 5, MEDIUM). Same bug class as #1607, on a separate code path the earlier fix did not cover. One fix per PR per the security-sweep policy.

Closes #1614.

Test plan

  • pytest xrspatial/geotiff/tests/test_gdal_metadata_xml_escape_1614.py (15 new tests pass)
  • pytest xrspatial/geotiff/tests/test_security.py xrspatial/geotiff/tests/test_geotags.py xrspatial/geotiff/tests/test_metadata_round_trip_1484.py xrspatial/geotiff/tests/test_vrt_xml_escape_1607.py xrspatial/geotiff/tests/test_features.py::TestGDALMetadata (90 passed, 2 skipped, no regressions)
  • Existing test_features.TestGDALMetadata::test_build_gdal_metadata_xml still passes (the legacy <Item name="DataType">Generic</Item> shape is preserved because quoteattr picks double quotes for plain strings).

`_build_gdal_metadata_xml` embedded caller-supplied dict keys and
values into the GDALMetadata XML payload using plain f-strings, with
no XML escaping. Any string containing `& < > " '` corrupted the
document and round-tripped through `_parse_gdal_metadata` as `{}`;
a crafted key like `foo" malicious="bar` injected extra attributes
into the `<Item>` element.

Route every text slot through `xml.sax.saxutils.escape` and every
attribute slot through `quoteattr`, mirroring the helpers added in
`_vrt.py` by #1607. Sample indices stay int-cast literals.

Reachable from `to_geotiff`, `_write_vrt_tiled`, and
`write_geotiff_gpu` whenever the user supplies
`attrs['gdal_metadata']` as a dict.

Found by the geotiff security sweep (Cat 5, MEDIUM). Same bug class
as #1607 on a separate code path. Existing
`test_features.TestGDALMetadata::test_build_gdal_metadata_xml`
still passes; new tests pin escape behaviour for every XML special
character and guard against attribute / element injection.
@github-actions github-actions Bot added the performance PR touches performance-sensitive code label May 11, 2026
@brendancol brendancol requested a review from Copilot May 11, 2026 20:23
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR hardens GeoTIFF GDALMetadata XML serialization to prevent malformed XML, silent metadata loss on round-trip, and XML attribute/element injection when caller-supplied metadata contains XML special characters.

Changes:

  • Escape GDALMetadata XML element text and quote/escape XML attribute values in _build_gdal_metadata_xml (xrspatial/geotiff/_geotags.py).
  • Add regression tests covering all XML special characters, injection attempts, unicode handling, and an end-to-end to_geotiff round trip (xrspatial/geotiff/tests/test_gdal_metadata_xml_escape_1614.py).
  • Update the security sweep tracking state to record the fix for issue #1614 (.claude/sweep-security-state.csv).

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.

File Description
xrspatial/geotiff/_geotags.py Escapes/quotes all caller-controlled GDALMetadata XML fields to prevent malformed XML and injection.
xrspatial/geotiff/tests/test_gdal_metadata_xml_escape_1614.py Adds focused regression + round-trip tests for XML escaping and injection resistance.
.claude/sweep-security-state.csv Records the #1614 geotiff sweep finding as addressed.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@brendancol brendancol merged commit 4c8a5d2 into main May 11, 2026
16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance PR touches performance-sensitive code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

_build_gdal_metadata_xml does not escape XML special chars (corrupted round-trip)

2 participants