Skip to content

Bug reports and bug fixes - 2nd slack - 3 more IDEExtendedTaintAnalysis false-negatives #846

@pserwa-exida

Description

@pserwa-exida

Hi Fabian!

Follow-up to the issue I opened earlier ("Several build-compat + taint/path-sensitivity fixes") — same fork, same toolchain, no new context needed. After that batch landed we pointed the analysis at a gnarlier real activity (an obstacle/polyline extractor — a vtable-bearing class whose members sit behind the vtable pointer, a confidence value built by OR-ing two fields together, and a whole-struct message copy). Three more real flows went missing on it, all in IDEExtendedTaintAnalysis, all false negatives. We've fixed all three on the same branch and I wanted to write them up while they were fresh.

Same as last time: each commit message is a standalone write-up, and I've pulled a minimal fixture out of our suite for each. All three fixtures are clean (no FP, no FN) on the combo we actually ship — --solver=ide-xtaint --alias-analysis=union-find. Happy to split into separate PRs whenever you want them.

The three

  1. Taint on this->member.<field> is dropped at the call edge when the member is at a non-zero offset (false negative). A free function (or any callee) that takes &this->member by reference/pointer — the obstacles::FanToPolylines(const Fan& in, Polylines& out) shape — doesn't receive the taint when member sits at a non-zero byte offset inside the object. The fact about this->member.field arrives on the callee's formal still carrying the member's displacement, so the callee reads its parameter at the wrong byte and the flow vanishes. The reason it hid for so long: if the member happens to be at offset 0 (first field, no vtable), the displacement is 0 and the transfer is correct — so every small reproducer that put the struct first "worked", and only a class with a vtable pointer (member at byte 8) or a leading scalar exposed it. Cause is in AbstractMemoryLocationFactory::withTransferToImpl: when it rebases a caller fact onto the callee formal it wasn't subtracting the &this->member displacement. Fix subtracts it (base-match + single leading member offset; falls back to the existing leaf-clamp otherwise).
    Severity: recall (missed real flow through a member-pointer converter).
    Fix → https://gitlab.com/exida-oss/stig/phasar/-/commit/3ce82bf32eb169fe95146c97490b0a337611273c
    Triggers → activity_member_struct_external_kernel_offset.cpp (a leading long pushes the converted members off offset 0 — same effect as a vtable pointer, without needing one)

  2. Taint doesn't flow through value-combining instructions (false negative). When a tainted value is combined with another through a binary op / icmp / select and the result is stored, the result loses the taint. Ours was a confidence byte: out.hoo = (uint8_t)classification_type | (uint8_t)position_x;classification_type is tainted, so out.hoo should be too, but it came out clean. The factory that builds access-path facts in getNormalFlowFunction only follows single-operand derivations (load / cast / GEP — there's one access path to chase). A binary op / compare / select has two value operands and no single access path, so the result was treated as a fresh opaque base and never inherited operand taint. PHINode is already special-cased to union its incoming facts; this is the same idea for the value-combiners. Fix generates a fact for the instruction result when any operand matches the incoming source. One wrinkle worth calling out: we exclude bit-shifts. Struct returns of two i32s get packed into an i64 and unpacked with trunc(lshr(packed, 32)), and if shifts propagate taint that packing bleeds one returned field onto its sibling (a fresh FP). Shifts relocate bits rather than combine two tainted inputs, so dropping them is principled, not a fudge.
    Severity: recall (any tainted value merged arithmetically/bitwise before a sink).
    Fix → https://gitlab.com/exida-oss/stig/phasar/-/commit/4a021fe490a31475ee85ccacfb7633edcef19ef4
    Triggers → value_combine_or.cpp (the operands are loaded struct fields, which is the case the access-path rebasing reaches and the shape the real activity has — a pure call-result merge is a separate, value-representation issue and we left that one as a known FN)

  3. Whole-struct assignment lowered to llvm.memcpy drops per-field taint (false negative). dst = src for a struct big enough that the compiler emits a single llvm.memcpy instead of scalar stores — a field tainted in src doesn't reach the same field of dst. This one was painful in practice: it's why our activity had a hand-expanded fan_ = message.data; into 14 explicit per-field stores as a workaround. The MemTransferInst handling in getSummaryFlowFunction routed the copy through the generic store path, which appended the source field's offset path onto the destination buffer pointer and produced a non-canonical, "lumpy" offset vector (something like {this,[0,4,0]}) that a later flattened load of the same field ({this,[4,0]}) couldn't match — so the copy silently lost field taint. Fix rebases each source-buffer field onto the destination with withTransferFrom, which yields the canonical flattened offset and matches the load; base-match works at any nesting depth.
    Severity: recall (whole-struct / message copies — extremely common in our domain).
    Fix → https://gitlab.com/exida-oss/stig/phasar/-/commit/8f9d431392ab94e3f751f0b75770dbc2349189e1
    Triggers → struct_copy_memcpy.cpp (the struct carries borders[256] so the assignment stays a memcpy at -O2 rather than being scalarised — drop the array and it gets optimised into per-field stores and the bug hides)

One instrumentation commit, optional

Same as the XSTORE dump in the first issue: while chasing #10/#11 we added an env-gated dump of the call/return boundary facts (STIG_CALL_DEBUG=1 prints XCALL / XRET lines). It changes nothing when the env var is unset and we found it handy enough to keep — take it or leave it.

On the combos

I only assert these fixtures on ide-xtaint × union-find, since that's the configuration we focus on (although svf-vta is also a contender). They're field-sensitivity / access-path problems, so the field-insensitive solvers (ifds-taint) and the field-insensitive alias backend over- or under-approximate them in the usual ways regardless of these patches — we're not claiming to fix those, just flagging that if you run the fixtures across the full solver/alias matrix you'll see the expected approximation on the other cells, not a regression.

Reproducing

Same as before — the three .cpp files are linked inline above, in our public repo next to the first issue's fixtures.

Regards,
Piotr

==
Piotr Serwa
Principal Safety Engineer
exida

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions