Hi Fabian!
Follow-up to the issue I opened earlier ("Several build-compat + taint/path-sensitivity fixes") — same fork, same toolchain, no new context needed. After that batch landed we pointed the analysis at a gnarlier real activity (an obstacle/polyline extractor — a vtable-bearing class whose members sit behind the vtable pointer, a confidence value built by OR-ing two fields together, and a whole-struct message copy). Three more real flows went missing on it, all in IDEExtendedTaintAnalysis, all false negatives. We've fixed all three on the same branch and I wanted to write them up while they were fresh.
Same as last time: each commit message is a standalone write-up, and I've pulled a minimal fixture out of our suite for each. All three fixtures are clean (no FP, no FN) on the combo we actually ship — --solver=ide-xtaint --alias-analysis=union-find. Happy to split into separate PRs whenever you want them.
The three
-
Taint on this->member.<field> is dropped at the call edge when the member is at a non-zero offset (false negative). A free function (or any callee) that takes &this->member by reference/pointer — the obstacles::FanToPolylines(const Fan& in, Polylines& out) shape — doesn't receive the taint when member sits at a non-zero byte offset inside the object. The fact about this->member.field arrives on the callee's formal still carrying the member's displacement, so the callee reads its parameter at the wrong byte and the flow vanishes. The reason it hid for so long: if the member happens to be at offset 0 (first field, no vtable), the displacement is 0 and the transfer is correct — so every small reproducer that put the struct first "worked", and only a class with a vtable pointer (member at byte 8) or a leading scalar exposed it. Cause is in AbstractMemoryLocationFactory::withTransferToImpl: when it rebases a caller fact onto the callee formal it wasn't subtracting the &this->member displacement. Fix subtracts it (base-match + single leading member offset; falls back to the existing leaf-clamp otherwise).
Severity: recall (missed real flow through a member-pointer converter).
Fix → https://gitlab.com/exida-oss/stig/phasar/-/commit/3ce82bf32eb169fe95146c97490b0a337611273c
Triggers → activity_member_struct_external_kernel_offset.cpp (a leading long pushes the converted members off offset 0 — same effect as a vtable pointer, without needing one)
-
Taint doesn't flow through value-combining instructions (false negative). When a tainted value is combined with another through a binary op / icmp / select and the result is stored, the result loses the taint. Ours was a confidence byte: out.hoo = (uint8_t)classification_type | (uint8_t)position_x; — classification_type is tainted, so out.hoo should be too, but it came out clean. The factory that builds access-path facts in getNormalFlowFunction only follows single-operand derivations (load / cast / GEP — there's one access path to chase). A binary op / compare / select has two value operands and no single access path, so the result was treated as a fresh opaque base and never inherited operand taint. PHINode is already special-cased to union its incoming facts; this is the same idea for the value-combiners. Fix generates a fact for the instruction result when any operand matches the incoming source. One wrinkle worth calling out: we exclude bit-shifts. Struct returns of two i32s get packed into an i64 and unpacked with trunc(lshr(packed, 32)), and if shifts propagate taint that packing bleeds one returned field onto its sibling (a fresh FP). Shifts relocate bits rather than combine two tainted inputs, so dropping them is principled, not a fudge.
Severity: recall (any tainted value merged arithmetically/bitwise before a sink).
Fix → https://gitlab.com/exida-oss/stig/phasar/-/commit/4a021fe490a31475ee85ccacfb7633edcef19ef4
Triggers → value_combine_or.cpp (the operands are loaded struct fields, which is the case the access-path rebasing reaches and the shape the real activity has — a pure call-result merge is a separate, value-representation issue and we left that one as a known FN)
-
Whole-struct assignment lowered to llvm.memcpy drops per-field taint (false negative). dst = src for a struct big enough that the compiler emits a single llvm.memcpy instead of scalar stores — a field tainted in src doesn't reach the same field of dst. This one was painful in practice: it's why our activity had a hand-expanded fan_ = message.data; into 14 explicit per-field stores as a workaround. The MemTransferInst handling in getSummaryFlowFunction routed the copy through the generic store path, which appended the source field's offset path onto the destination buffer pointer and produced a non-canonical, "lumpy" offset vector (something like {this,[0,4,0]}) that a later flattened load of the same field ({this,[4,0]}) couldn't match — so the copy silently lost field taint. Fix rebases each source-buffer field onto the destination with withTransferFrom, which yields the canonical flattened offset and matches the load; base-match works at any nesting depth.
Severity: recall (whole-struct / message copies — extremely common in our domain).
Fix → https://gitlab.com/exida-oss/stig/phasar/-/commit/8f9d431392ab94e3f751f0b75770dbc2349189e1
Triggers → struct_copy_memcpy.cpp (the struct carries borders[256] so the assignment stays a memcpy at -O2 rather than being scalarised — drop the array and it gets optimised into per-field stores and the bug hides)
One instrumentation commit, optional
Same as the XSTORE dump in the first issue: while chasing #10/#11 we added an env-gated dump of the call/return boundary facts (STIG_CALL_DEBUG=1 prints XCALL / XRET lines). It changes nothing when the env var is unset and we found it handy enough to keep — take it or leave it.
On the combos
I only assert these fixtures on ide-xtaint × union-find, since that's the configuration we focus on (although svf-vta is also a contender). They're field-sensitivity / access-path problems, so the field-insensitive solvers (ifds-taint) and the field-insensitive alias backend over- or under-approximate them in the usual ways regardless of these patches — we're not claiming to fix those, just flagging that if you run the fixtures across the full solver/alias matrix you'll see the expected approximation on the other cells, not a regression.
Reproducing
Same as before — the three .cpp files are linked inline above, in our public repo next to the first issue's fixtures.
Regards,
Piotr
==
Piotr Serwa
Principal Safety Engineer
exida
Hi Fabian!
Follow-up to the issue I opened earlier ("Several build-compat + taint/path-sensitivity fixes") — same fork, same toolchain, no new context needed. After that batch landed we pointed the analysis at a gnarlier real activity (an obstacle/polyline extractor — a vtable-bearing class whose members sit behind the vtable pointer, a confidence value built by OR-ing two fields together, and a whole-struct message copy). Three more real flows went missing on it, all in
IDEExtendedTaintAnalysis, all false negatives. We've fixed all three on the same branch and I wanted to write them up while they were fresh.187fc5d55,f9cc5e1e6,0ddc8de64— the store-gate work). This one covers the four after those. No overlap.clang++-18 -std=c++14 -O2 -g. The-O2matters for dlopen, dlclose: Initialization and finalization functions #3 below (it's what turns the assignment into amemcpy), but LLVM IR code coverage and OS call taint #1 and Make explicit that this is the MIT license #2 reproduce at-O0too.Same as last time: each commit message is a standalone write-up, and I've pulled a minimal fixture out of our suite for each. All three fixtures are clean (no FP, no FN) on the combo we actually ship —
--solver=ide-xtaint --alias-analysis=union-find. Happy to split into separate PRs whenever you want them.The three
Taint on
this->member.<field>is dropped at the call edge when the member is at a non-zero offset (false negative). A free function (or any callee) that takes&this->memberby reference/pointer — theobstacles::FanToPolylines(const Fan& in, Polylines& out)shape — doesn't receive the taint whenmembersits at a non-zero byte offset inside the object. The fact aboutthis->member.fieldarrives on the callee's formal still carrying the member's displacement, so the callee reads its parameter at the wrong byte and the flow vanishes. The reason it hid for so long: if the member happens to be at offset 0 (first field, no vtable), the displacement is 0 and the transfer is correct — so every small reproducer that put the struct first "worked", and only a class with a vtable pointer (member at byte 8) or a leading scalar exposed it. Cause is inAbstractMemoryLocationFactory::withTransferToImpl: when it rebases a caller fact onto the callee formal it wasn't subtracting the&this->memberdisplacement. Fix subtracts it (base-match + single leading member offset; falls back to the existing leaf-clamp otherwise).Severity: recall (missed real flow through a member-pointer converter).
Fix → https://gitlab.com/exida-oss/stig/phasar/-/commit/3ce82bf32eb169fe95146c97490b0a337611273c
Triggers →
activity_member_struct_external_kernel_offset.cpp(a leadinglongpushes the converted members off offset 0 — same effect as a vtable pointer, without needing one)Taint doesn't flow through value-combining instructions (false negative). When a tainted value is combined with another through a binary op /
icmp/selectand the result is stored, the result loses the taint. Ours was a confidence byte:out.hoo = (uint8_t)classification_type | (uint8_t)position_x;—classification_typeis tainted, soout.hooshould be too, but it came out clean. The factory that builds access-path facts ingetNormalFlowFunctiononly follows single-operand derivations (load / cast / GEP — there's one access path to chase). A binary op / compare / select has two value operands and no single access path, so the result was treated as a fresh opaque base and never inherited operand taint.PHINodeis already special-cased to union its incoming facts; this is the same idea for the value-combiners. Fix generates a fact for the instruction result when any operand matches the incoming source. One wrinkle worth calling out: we exclude bit-shifts. Struct returns of twoi32s get packed into ani64and unpacked withtrunc(lshr(packed, 32)), and if shifts propagate taint that packing bleeds one returned field onto its sibling (a fresh FP). Shifts relocate bits rather than combine two tainted inputs, so dropping them is principled, not a fudge.Severity: recall (any tainted value merged arithmetically/bitwise before a sink).
Fix → https://gitlab.com/exida-oss/stig/phasar/-/commit/4a021fe490a31475ee85ccacfb7633edcef19ef4
Triggers →
value_combine_or.cpp(the operands are loaded struct fields, which is the case the access-path rebasing reaches and the shape the real activity has — a pure call-result merge is a separate, value-representation issue and we left that one as a known FN)Whole-struct assignment lowered to
llvm.memcpydrops per-field taint (false negative).dst = srcfor a struct big enough that the compiler emits a singlellvm.memcpyinstead of scalar stores — a field tainted insrcdoesn't reach the same field ofdst. This one was painful in practice: it's why our activity had a hand-expandedfan_ = message.data;into 14 explicit per-field stores as a workaround. TheMemTransferInsthandling ingetSummaryFlowFunctionrouted the copy through the generic store path, which appended the source field's offset path onto the destination buffer pointer and produced a non-canonical, "lumpy" offset vector (something like{this,[0,4,0]}) that a later flattened load of the same field ({this,[4,0]}) couldn't match — so the copy silently lost field taint. Fix rebases each source-buffer field onto the destination withwithTransferFrom, which yields the canonical flattened offset and matches the load; base-match works at any nesting depth.Severity: recall (whole-struct / message copies — extremely common in our domain).
Fix → https://gitlab.com/exida-oss/stig/phasar/-/commit/8f9d431392ab94e3f751f0b75770dbc2349189e1
Triggers →
struct_copy_memcpy.cpp(the struct carriesborders[256]so the assignment stays amemcpyat-O2rather than being scalarised — drop the array and it gets optimised into per-field stores and the bug hides)One instrumentation commit, optional
Same as the
XSTOREdump in the first issue: while chasing #10/#11 we added an env-gated dump of the call/return boundary facts (STIG_CALL_DEBUG=1printsXCALL/XRETlines). It changes nothing when the env var is unset and we found it handy enough to keep — take it or leave it.On the combos
I only assert these fixtures on
ide-xtaint × union-find, since that's the configuration we focus on (although svf-vta is also a contender). They're field-sensitivity / access-path problems, so the field-insensitive solvers (ifds-taint) and the field-insensitive alias backend over- or under-approximate them in the usual ways regardless of these patches — we're not claiming to fix those, just flagging that if you run the fixtures across the full solver/alias matrix you'll see the expected approximation on the other cells, not a regression.Reproducing
Same as before — the three
.cppfiles are linked inline above, in our public repo next to the first issue's fixtures.Regards,
Piotr
==
Piotr Serwa
Principal Safety Engineer
exida