Skip to content

Several bug reports with corresponding bug fixes #845

@pserwa-exida

Description

@pserwa-exida

Several build-compat + taint/path-sensitivity fixes (we have patches, happy to PR)

Hi Fabian, and the PhASAR team! 👋

I am using PhASAR as the taint/dataflow engine inside a larger C++ analysis toolchain (we drive the C++ API directly: IDEExtendedTaintAnalysis, the Z3BasedPathSensitivityManager, the UnionFind alias backends, etc., from our own taint-config JSON). It's been genuinely great to build on — thank you for the work you've put into it.

I am working at exida. We at exida believe in open source and collaboration, so we'd love to give back: along the way we hit a handful of issues, fixed them on a public fork, and would like to upstream them and we have actually no interest to maintain a parallel/downstream fork. One issue listing them all seemed the easiset way to start the conversation — happy to split into individual PRs in whatever shape works best for you.

  • Baseline: upstream development @ 5ca66dc55 ("WPDS Solver", #839).
  • Fork branch (all commits below): https://gitlab.com/exida-oss/stig/phasar/-/tree/stig-patches-2026.06
  • Toolchains we hit these on:
    • Fedora 44 — host clang-22 building PhASAR, LLVM 18 as the linked library, GCC-16 libstdc++ headers. (Most of the build-break + the clang-22 ICE + the GCC-16 errors are here.)
    • Ubuntu 24.04 — host clang-18, LLVM 18. (The LLVM-version find_package break and the correctness/crash issues reproduce here too.)

Each commit message is written as a standalone bug report (symptom + cause + fix + how to trigger). Quick index, with a minimal reproducer from our test suite where we have one:

Correctness / robustness (the ones we'd most love your eyes on)

  1. Path-feasibility aborts the whole run on a non-encodable branch guard. report_fatal_error / an ill-sorted Z3 term (Z3_mk_andast_manager::check_args) when a path to a sink is guarded by a float / pointer / bitwise condition. Uncatchable as a z3::exception, so one such edge kills the entire --emit-feasibility run. We added a conservative default-deny pre-pass that only encodes the Int/Bool fragment and drops other edges (sound / FN-safe).
    Severity: high (crash).
    Fix → https://gitlab.com/exida-oss/stig/phasar/-/commit/825d36d20580e880285dad924bc6287744ddfa6d
    Triggers → z3_infeasible_float_guard.cpp (also _pointer_guard / _bitwise_parity in the same dir)

  2. Path-feasibility aborts on an unresolved-pointer load. getLoadInstAsZ3 calls report_fatal_error("unsupported load!") for a load whose pointer isn't a direct alloca/GEP (common on optimized IR, e.g. a rematerialized *(ptr+off)). We model it as a fresh unconstrained term instead (the same havoc already used for unhandled types).
    Severity: high (crash).
    Fix → https://gitlab.com/exida-oss/stig/phasar/-/commit/9c40b0c4f753f8b3d13dcf4426553d0dc8d2b208
    (No minimal fixture — we hit this on a real optimized activity, the *(ptr+off) opaque-pointer shape; happy to extract a .ll.)

  3. IDEExtendedTaintAnalysis drops cross-field this writes at the return edge (false negative). The return flow-function uses a strict equivalent, which filters every exit fact whose first offset isn't 0 — so taint a callee writes into this->member at a non-zero offset never reaches the caller. The call edge already tolerates this via equivalentExceptPointerArithmetics; we mirror that outbound.
    Severity: high for absence-of-taint use (missed real flow).
    Fix → https://gitlab.com/exida-oss/stig/phasar/-/commit/500830dd4ed4a40a279bcf0cf7f325fa38e605fe
    Triggers → activity_cross_method_member.cpp

  4. IDEExtendedTaintAnalysis misses sinks across optimizer-rematerialized indirection (false negative). When the optimizer inserts an extra indirection element so the fact is one level deeper than the sink leaf, the strict sink-match misses the leak. Our leaf-aware match requires base + leading offset + leaf to all match (and avoids the sibling-field over-approximation a naive prefix relax causes).
    Severity: recall.
    Fix → https://gitlab.com/exida-oss/stig/phasar/-/commit/a921a2355b7f4c55796d509fc727b65939eb41f2
    Triggers → activity_member_struct_pointer_indirection.cpp

  5. IDEExtendedTaintAnalysis store flow-function conflates sibling scalar fields (false positive) — RESOLVED, two merge-ready commits. In getNormalFlowFunction, the store-FF tested the incoming fact against the stored value with equivalentExceptPointerArithmetics(Source, PALevel=1). For a scalar stored value the source has a single-element access path ({base,[0]}), so MinSize==1 hits the MinSize <= PALevel early-out and the predicate returns true on base match alone, ignoring the field offset — taint on s.qm bleeds onto a s.asilb = … store of the same base object (FP across siblings). A naive PALevel=0 over-corrects into FNs (exact-offset matching also rejects legitimate copies one element deeper than the source leaf). The right shape — and what we landed — is a leaf-aware byte-offset store gate: field identity = the cumulative byte offset from the base (summed over the access-path), base-equal, applied to scalar stores only (pointer/vector stores keep the existing PALevel behaviour). This kills the sibling FP without the over-correction and mirrors Rename command-line options #4's leaf-aware sink-match.

    Landing that gate then exposed a separate false negative upstream of it, in AbstractMemoryLocationFactory::createImpl: a ConstantExpr GEP (getelementptr(@g, …)) is an llvm::Constant, so it hit the self-define guard and seeded {the-gep-constant, [0]} — the field byte discarded — while a load of the same global field decomposed via the GEPOperator walk to {@g, [byte]}. Different base and offset, so the precise gate (correctly) dropped the taint, and any source routed through a global field (e.g. a static message buffer copied field-by-field into a member) was lost. Fix: exclude GEP-shaped constants from the self-define guard so they fall through to the existing GEPOperator branch and decompose like instruction GEPs; bare globals (Constant, not GEPOperator) still self-define. This makes global-routed sources field-sensitive end to end. (Two independent reviewers converged on this same cause from the IR + an XSTORE byte-offset trace.)
    Severity: precision (sibling FP) + recall (global-routed FN). Both resolved; on a real activity, data-flow recall went 2/9 → 9/9 with the synthetic sibling-FP suite staying clean.
    Fixes →

Build / toolchain compatibility

  1. cmake/add_llvm.cmake can't find a stock LLVM ≥ 18. find_package(LLVM ${PHASAR_LLVM_VERSION} ...) triggers LLVM 18+'s exact major.minor.patch version check, rejecting e.g. a distro 18.1.x. We drop the version arg and rely on -DLLVM_DIR.
    Severity: build break (Fedora 44 + Ubuntu 24.04).https://gitlab.com/exida-oss/stig/phasar/-/commit/f1e597a91053d482d204b31a3e4f403cd2320a2e

  2. Same trap in the installed phasarConfig.cmake (generated from Config.cmake.in) for downstream find_package(phasar), plus a Z3 module-vs-CONFIG target-name mismatch (z3 vs z3::libz3). Fixed at the template.
    Severity: downstream build break.https://gitlab.com/exida-oss/stig/phasar/-/commit/08e605b995f03fd9dd319ff0c29f11e249181a07

  3. clang ≥ 22 ICE (mangler segfault) on the CTAD form pag::PBMixin{...} in LLVMUnionFindAA. Spelling the template args explicitly avoids the synthesized deduction guide.
    Severity: build break (compiler ICE, Fedora 44 / clang-22).https://gitlab.com/exida-oss/stig/phasar/-/commit/d7c131e60ceb89e3c5d5175730c78cd1c8697e15

  4. GCC 16 hard errors on __has_cpp_attribute([[lifetimebound]]) (Macros.h) and stale member names r/c/v in Table<R,C,V>::Cell::operator<< (Table.h).
    Severity: build break (Fedora 44 / GCC-16 libstdc++).https://gitlab.com/exida-oss/stig/phasar/-/commit/e1ad35997eff7d09d6496047dc8c2ab10c3b53fd

One more we observed but have not patched (FYI / repro for you)

  • SIGSEGV in the CFLFieldSens IFDS solver (IterativeIDESolver<CFLFieldSensIFDSProblem>::storeResultsAndPropagateDenseMapBase::FindAndConstruct) on a nested-class-via-helper pattern, with the SVF demand-driven alias backend. We don't have a fix — flagging it in case it's useful. Crash stack + the triggering fixture:
    nested_class_via_helper_function.cpp (--solver=ifds-fieldsens-taint --alias-analysis=svf-dda). Happy to attach the full core/backtrace.

On tests

Your contributing guide asks each patch to ship a small unit test. To address this, we refer to our test cases - you are free to take them over.

On the mechanics (GitLab → GitHub)

We developed these on a GitLab fork (gitlab.com/exida-oss/stig/phasar) because that's where our toolchain pins live. GitHub PRs need the source branch on GitHub, so a GitLab→GitHub PR isn't directly possible — when you're ready for us to open PRs we'll push our stig-patches-2026.06.

Reproducing

The linked .cpp fixtures are in our (exida) repo - they are listed next to each patch summary above. Feel free to integrate them.

Other info

Here is matrix accuracy ranking, maybe it is interesting for you (this shows the results with the patches applied):

> Task :stig-gradle-plugin:showMatrixRanking

Matrix accuracy ranking: /home/pserwa/Documents/public/gitlab.com/exida-oss/stig/stig/stig-gradle-plugin/build/reports/taint-matrix-ranking.txt
------------------------------------------------------------------------
StigLlvmToolchain — single source of truth for stig's clang/LLVM flags
------------------------------------------------------------------------
  LLVM major (Job 2/3, pinned)  : 18
  clang++ (Job 3, user code → IR): clang++-18
  llvm-link / llvm-dis / llvm-as : llvm-link-18 / llvm-dis-18 / llvm-as-18
  opt / llvm-extract / llvm-nm   : opt-18 / llvm-extract-18 / llvm-nm-18
  llvm-cxxfilt                   : llvm-cxxfilt-18
  clang-tidy                     : clang-tidy-18

  C++ standard (CXX_STANDARD)    : c++14
  Opt level (OPT_LEVEL)          : -O2
  Debug-info flag                : -g
  Preserve SSA value names       : -fno-discard-value-names
  Inliner threshold override     : -mllvm -inline-threshold=100000
  Disabled passes                : -fno-slp-vectorize

  Assembled analysis-compile flags (analysisFlags()):
    -std=c++14 -O2 -g -fno-discard-value-names -mllvm -inline-threshold=100000 -fno-slp-vectorize
------------------------------------------------------------------------

--------------------------------------------------------------------------------------------------------------------------------------------
PhASAR mode-combo ranking — TaintMatrixTest. Four truth axes, each a confusion matrix
(% = (TP+TN)/total; TP/FN/TN/FP), tallied per (scenario × combo) cell. Sorted by LEAK %.
  LEAK     — taint leak verdict. positive = sink leaked. qm should leak (TP/FN), asilb should not (TN/FP).
  Z3       — Z3 path-feasibility PRUNING of a leaked qm sink (ide-xtaint only). positive = leak kept
             (verdict ≠ "no"). normal leak: "yes"=TP / else FN.  infeasible fixture: "no"=TN / else FP.
             "unknown" counts as did-NOT-prune. So the qmSinksInfeasible cases (real-leak + Z3-infeasible)
             surface as FP — the honest "Z3 returns Unknown, prunes nothing" signal, not a free TN.
  DATAFLOW — sink's DAG traces back to a qm source, walked in Kotlin (ide-xtaint only). qm should (TP/FN),
             asilb should not (TN/FP). FN here can also mean PhASAR omitted the source call from the DAG.
  CFG      — pure ICFG reachability FROM THE ENTRY POINT for every declared sink+source. positive =
             reachable; all are expected reachable, so TP=reachable / FN=unreachable (TN/FP stay 0).
  TIME     — wall-clock in analyzer subprocesses across all cells for the combo.
--------------------------------------------------------------------------------------------------------------------------------------------
            LEAK                     Z3                   DATAFLOW                  CFG           TIME    combo
      %  TP  FN  TN  FP       %  TP  FN  TN  FP       %  TP  FN  TN  FP       %  TP  FN  TN  FP   
   93.3  62   6  63   3    90.3  56   0   0   6    91.8  60   8  63   3   100.0 207   0   1   0     1.9s  ide-xtaint × union-find        (production default — IDE solver + ctx-ind-sens)
   91.8  61   7  62   4    90.2  55   0   0   6    90.3  59   9  62   4   100.0 207   0   1   0     1.9s  ide-xtaint × cflanders         (IDE solver + cflanders)
   91.8  60   8  63   3    90.0  54   0   0   6    90.3  58  10  63   3   100.0 207   0   1   0     2.8s  ide-xtaint × svf-dda           (IDE solver + SVF demand-driven)
   91.8  60   8  63   3    90.0  54   0   0   6    90.3  58  10  63   3   100.0 207   0   1   0     2.8s  ide-xtaint × svf-vfs           (IDE solver + SVF flow/ctx-sens)
   86.6  56  12  60   6             n/a                     n/a           100.0 207   0   1   0     2.6s  backend=svf × union-find (Phase B — real SrcSnkDDA)
   86.6  56  12  60   6             n/a                     n/a           100.0 207   0   1   0     2.6s  backend=svf × cflanders  (Phase B — real SrcSnkDDA)
   83.6  51  17  61   5             n/a                     n/a           100.0 207   0   1   0     0.9s  ifds-fieldsens-taint × union-find (matrix-top, NOT production)
   82.8  49  19  62   4             n/a                     n/a           100.0 207   0   1   0     0.9s  ifds-fieldsens-taint × cflanders
   82.8  61   7  50  16             n/a                     n/a           100.0 207   0   1   0     0.9s  ifds-taint × union-find        (field-INsens + ctx-ind-sens alias)
   81.3  59   9  50  16             n/a                     n/a           100.0 207   0   1   0     1.8s  ifds-taint × svf-dda           (field-INsens + SVF demand-driven)
   81.3  59   9  50  16             n/a                     n/a           100.0 207   0   1   0     1.8s  ifds-taint × svf-vfs           (field-INsens + SVF flow/ctx-sens)
--------------------------------------------------------------------------------------------------------------------------------------------

  ide-xtaint × union-find        (production default — IDE solver + ctx-ind-sens):
    [xfail] arithmetic_merge: FN missing [sink_qm]
    [xfail] validator_returns_comparison: FN missing [sink_qm_via_bool]
    [xfail] validator_returns_passthrough: FN missing [sink_qm_via_bool]
    [xfail] validator_writes_via_pointer: FN missing [sink_qm_via_bool]
    [xfail] member_function_pointer: FN missing [sink_qm]
    [xfail] fptr_multi_targets: FN missing [sink_propagate_a]
    [xfail] virtual_operator_apply: FP extra [sink_drop_b]
    [xfail] virtual_inheritance_converter: FP extra [sink_drop_b]
    [xfail] constant_branch_path_disambiguation: FP extra [sink_drop_b]

As you see, there are still some bugs with ide-xtaint × union-find, probably I will still have some time to investigate this.

Regards,
Piotr

==
Piotr Serwa
Principal Safety Engineer
exida

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions