Implement memory handling correctness fixes#2016
Implement memory handling correctness fixes#2016alcaeus wants to merge 13 commits intomongodb:v2.3from
Conversation
Add an early return in phongo_field_path_pop when the path has no elements to decrement. The function previously assumed callers maintained push/pop balance, which makes the helper fragile under future error-recovery edits.
ZEND_SL and ZEND_SR previously delegated directly to the C shift operators without bounding the shift amount. Match the behaviour of PHP's native integer shift handlers: throw ArithmeticError on negative shift counts, return 0 (or -1 for arithmetic right-shift of a negative value) when the count is at or above the operand width, and use an unsigned cast for the left-shift to avoid relying on signed shift semantics.
Have the helper return whether the conversion populated the destination bson_value_t and forward that result through phongo_zval_to_bson_value. The IS_ARRAY/IS_OBJECT branch previously returned true unconditionally, which is misleading when an exception was thrown inside the inner encoder.
The document and array visitors call into userland bsonUnserialize() but did not consult EG(exception) before inserting the resulting object into the parent container. When bsonUnserialize() throws, abort the traversal cleanly and propagate the exception instead of attaching a partially-constructed object to the parent.
The cache key produced by phongo_manager_make_client_hash previously contained the raw connection string. Replace it with a SHA-1 digest of the URI so the key remains a stable per-URI identifier without embedding the connection string verbatim into long-lived process storage.
When all entries in field_path->elements are NULL, the loop appends nothing and the trailing '.' overwrite would land before the start of the allocated buffer. Branch on whether the loop wrote anything before trimming the trailing separator.
The condition sparsity > INT64_MAX is always false because sparsity is declared as int64_t. Remove the dead branch and keep the meaningful sparsity < 0 guard.
Change phongo_bson_data_to_zval and phongo_bson_data_to_zval_ex to accept size_t lengths instead of int, matching the unsigned width expected by libbson's reader API. Existing callers pass uint32_t from bson_iter_document, so the widening is implicit.
Bring phongo_regex_t in line with the other string-carrying structs in this file, which already use size_t for length fields. Removes a silent narrowing of the size_t parameter accepted by phongo_regex_init.
phongo_regex_new (used when decoding a Regex from BSON) previously copied the pattern and flags directly without sorting the flags alphabetically, while phongo_regex_init does. This caused two Regex instances representing the same pattern and flags to compare unequal depending on which path constructed them. Defer to phongo_regex_init to keep the canonicalisation in one place.
The IS_STRING branch wrote the PHP string verbatim into the bson_value_t without checking that it was valid UTF-8, while the phongo_bson_append document-encoding path already validates. Match the existing pattern so that invalid UTF-8 is rejected with an exception in both code paths.
When phongo_zval_to_bson throws while encoding the scope, release the already-allocated code buffer and the partially-written scope BSON, and report failure to the caller. Previously the function returned true even though intern was left in an inconsistent state.
phongo_binary_init, phongo_regex_init, and phongo_javascript_init overwrote heap-owned struct members without freeing what was already there. Re-init the slots cleanly so a second call (e.g. from a subclass that invokes parent::__unserialize twice) does not leak the prior allocation. As part of this, hoist the regex flags null-byte check above the pattern allocation so a flag rejection can no longer leave a stray pattern buffer behind.
There was a problem hiding this comment.
Pull request overview
This PR hardens several low-level correctness paths in the PHP MongoDB extension, mainly around BSON decoding/encoding, BSON type initialization, and manager client hashing. It fits into the extension’s core runtime layer by tightening error propagation, reducing unsafe reinitialization behavior, and normalizing a few internal type/length handling paths.
Changes:
- Reworks BSON traversal/encoding paths to better propagate failures, validate UTF-8 consistently, and fix field-path edge cases.
- Adjusts BSON value object initialization for
Binary,Regex,Javascript, andInt64to improve reinit behavior and operator correctness. - Changes manager client hashing to use a SHA-256 digest of the URI and applies a few internal type cleanups.
Reviewed changes
Copilot reviewed 10 out of 10 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
src/phongo_structs.h |
Updates internal Regex length fields to size_t. |
src/phongo_client.c |
Switches manager client hash input from raw URI text to a SHA-256 digest. |
src/phongo_bson.h |
Changes BSON data decode length parameters from int to size_t. |
src/phongo_bson.c |
Fixes field-path edge cases and stops nested BSON traversal cleanly on bsonUnserialize exceptions. |
src/phongo_bson_encode.c |
Makes BSON value encoding return failures and validates UTF-8 for scalar strings. |
src/MongoDB/ClientEncryption.c |
Removes a redundant sparsity upper-bound check. |
src/BSON/Regex.c |
Reuses common Regex init logic and frees prior buffers before reinit. |
src/BSON/Javascript.c |
Frees prior code/scope before reinit and propagates scope-encoding failures. |
src/BSON/Int64.c |
Aligns shift operator behavior with PHP’s arithmetic/width rules. |
src/BSON/Binary.c |
Frees prior binary data before reinitialization. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if (EG(exception)) { | ||
| efree(intern->code); | ||
| intern->code = NULL; | ||
| bson_destroy(intern->scope); | ||
| intern->scope = NULL; | ||
| return false; |
|
|
||
| #include <php.h> | ||
| #include <ext/hash/php_hash.h> | ||
| #include <ext/hash/php_hash_sha.h> |
| if (value2 < 0) { | ||
| zend_throw_exception(zend_ce_arithmetic_error, "Bit shift by negative number", 0); | ||
| return FAILURE; | ||
| } | ||
| OPERATION_RESULT_INT64(value2 >= 64 ? 0 : (int64_t) ((uint64_t) value1 << value2)); |
| if (!bson_utf8_validate(Z_STRVAL_P(data), Z_STRLEN_P(data), true)) { | ||
| phongo_throw_exception(PHONGO_ERROR_UNEXPECTED_VALUE, "Detected invalid UTF-8 in string value"); | ||
| return false; | ||
| } |
| if (field_path->size == 0) { | ||
| return false; | ||
| } |
| PHONGO_GET_INT64(value2, op2); | ||
| OPERATION_RESULT_INT64(value1 << value2); | ||
| if (value2 < 0) { | ||
| zend_throw_exception(zend_ce_arithmetic_error, "Bit shift by negative number", 0); |
There was a problem hiding this comment.
This exception can also be tested.
|
|
||
| zval args; | ||
|
|
||
| PHP_SHA256Init(&sha_ctx); |
There was a problem hiding this comment.
I confirm sh256 is the best algo for this use-case: crypto-safe for credentials and fast in the execution hot path (Hardware Acceleration).
| PHP_SHA256Init(&sha_ctx); | ||
| PHP_SHA256Update(&sha_ctx, (const unsigned char*) uri_string, strlen(uri_string)); | ||
| PHP_SHA256Final(sha_digest, &sha_ctx); | ||
| php_hash_bin2hex(sha_hex, sha_digest, sizeof(sha_digest)); |
There was a problem hiding this comment.
Why do you need to convert to hexadecimal? Is it displayed?
This PR introduces a bunch of defensive hardening and consistency fixes across the extension. Each commit is scoped to a single change to keep review and bisecting straightforward.
ArithmeticErroron negative shift counts and return0/-1at or above the operand width.bsonUnserializethrows, instead of inserting a partially-constructed object into the parent.phongo_zval_to_bson_value_exand validate UTF-8 in the scalar path so it matchesphongo_bson_append.phongo_field_path_popagainst an empty path and fix the trailing-separator overwrite inphongo_field_path_as_stringwhen all entries are NULL.phongo_javascript_init, routephongo_regex_newthroughphongo_regex_initso BSON-decoded Regex instances pick up the same flag canonicalisation as PHP-constructed ones.size_tfor BSON data lengths and forphongo_regex_tlength fields; drop a tautologicalsparsity > INT64_MAXcheck in ClientEncryption.