You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Several build-compat + taint/path-sensitivity fixes (we have patches, happy to PR)
Hi Fabian, and the PhASAR team! 👋
I am using PhASAR as the taint/dataflow engine inside a larger C++ analysis toolchain (we drive the C++ API directly: IDEExtendedTaintAnalysis, the Z3BasedPathSensitivityManager, the UnionFind alias backends, etc., from our own taint-config JSON). It's been genuinely great to build on — thank you for the work you've put into it.
I am working at exida. We at exida believe in open source and collaboration, so we'd love to give back: along the way we hit a handful of issues, fixed them on a public fork, and would like to upstream them and we have actually no interest to maintain a parallel/downstream fork. One issue listing them all seemed the easiset way to start the conversation — happy to split into individual PRs in whatever shape works best for you.
Baseline: upstream development @ 5ca66dc55 ("WPDS Solver", #839).
Fedora 44 — host clang-22 building PhASAR, LLVM 18 as the linked library, GCC-16 libstdc++ headers. (Most of the build-break + the clang-22 ICE + the GCC-16 errors are here.)
Ubuntu 24.04 — host clang-18, LLVM 18. (The LLVM-version find_package break and the correctness/crash issues reproduce here too.)
Each commit message is written as a standalone bug report (symptom + cause + fix + how to trigger). Quick index, with a minimal reproducer from our test suite where we have one:
Correctness / robustness (the ones we'd most love your eyes on)
Path-feasibility aborts the whole run on a non-encodable branch guard.report_fatal_error / an ill-sorted Z3 term (Z3_mk_and → ast_manager::check_args) when a path to a sink is guarded by a float / pointer / bitwise condition. Uncatchable as a z3::exception, so one such edge kills the entire --emit-feasibility run. We added a conservative default-deny pre-pass that only encodes the Int/Bool fragment and drops other edges (sound / FN-safe). Severity: high (crash).
Fix → https://gitlab.com/exida-oss/stig/phasar/-/commit/825d36d20580e880285dad924bc6287744ddfa6d
Triggers → z3_infeasible_float_guard.cpp (also _pointer_guard / _bitwise_parity in the same dir)
Path-feasibility aborts on an unresolved-pointer load.getLoadInstAsZ3 calls report_fatal_error("unsupported load!") for a load whose pointer isn't a direct alloca/GEP (common on optimized IR, e.g. a rematerialized *(ptr+off)). We model it as a fresh unconstrained term instead (the same havoc already used for unhandled types). Severity: high (crash).
Fix → https://gitlab.com/exida-oss/stig/phasar/-/commit/9c40b0c4f753f8b3d13dcf4426553d0dc8d2b208
(No minimal fixture — we hit this on a real optimized activity, the *(ptr+off) opaque-pointer shape; happy to extract a .ll.)
IDEExtendedTaintAnalysis drops cross-field this writes at the return edge (false negative). The return flow-function uses a strict equivalent, which filters every exit fact whose first offset isn't 0 — so taint a callee writes into this->member at a non-zero offset never reaches the caller. The call edge already tolerates this via equivalentExceptPointerArithmetics; we mirror that outbound. Severity: high for absence-of-taint use (missed real flow).
Fix → https://gitlab.com/exida-oss/stig/phasar/-/commit/500830dd4ed4a40a279bcf0cf7f325fa38e605fe
Triggers → activity_cross_method_member.cpp
IDEExtendedTaintAnalysis misses sinks across optimizer-rematerialized indirection (false negative). When the optimizer inserts an extra indirection element so the fact is one level deeper than the sink leaf, the strict sink-match misses the leak. Our leaf-aware match requires base + leading offset + leaf to all match (and avoids the sibling-field over-approximation a naive prefix relax causes). Severity: recall.
Fix → https://gitlab.com/exida-oss/stig/phasar/-/commit/a921a2355b7f4c55796d509fc727b65939eb41f2
Triggers → activity_member_struct_pointer_indirection.cpp
IDEExtendedTaintAnalysis store flow-function conflates sibling scalar fields (false positive) — RESOLVED, two merge-ready commits. In getNormalFlowFunction, the store-FF tested the incoming fact against the stored value with equivalentExceptPointerArithmetics(Source, PALevel=1). For a scalar stored value the source has a single-element access path ({base,[0]}), so MinSize==1 hits the MinSize <= PALevel early-out and the predicate returns true on base match alone, ignoring the field offset — taint on s.qm bleeds onto a s.asilb = … store of the same base object (FP across siblings). A naive PALevel=0 over-corrects into FNs (exact-offset matching also rejects legitimate copies one element deeper than the source leaf). The right shape — and what we landed — is a leaf-aware byte-offset store gate: field identity = the cumulative byte offset from the base (summed over the access-path), base-equal, applied to scalar stores only (pointer/vector stores keep the existing PALevel behaviour). This kills the sibling FP without the over-correction and mirrors Rename command-line options #4's leaf-aware sink-match.
Landing that gate then exposed a separate false negative upstream of it, in AbstractMemoryLocationFactory::createImpl: a ConstantExpr GEP (getelementptr(@g, …)) is an llvm::Constant, so it hit the self-define guard and seeded {the-gep-constant, [0]} — the field byte discarded — while a load of the same global field decomposed via the GEPOperator walk to {@g, [byte]}. Different base and offset, so the precise gate (correctly) dropped the taint, and any source routed through a global field (e.g. a static message buffer copied field-by-field into a member) was lost. Fix: exclude GEP-shaped constants from the self-define guard so they fall through to the existing GEPOperator branch and decompose like instruction GEPs; bare globals (Constant, not GEPOperator) still self-define. This makes global-routed sources field-sensitive end to end. (Two independent reviewers converged on this same cause from the IR + an XSTORE byte-offset trace.) Severity: precision (sibling FP) + recall (global-routed FN). Both resolved; on a real activity, data-flow recall went 2/9 → 9/9 with the synthetic sibling-FP suite staying clean.
Fixes →
(instrumentation, optional) base-annotated XSTORE debug dump behind STIG_STORE_DEBUG: https://gitlab.com/exida-oss/stig/phasar/-/commit/0ddc8de64bc49098404cc78f770328e3d12c8866
Triggers — sibling FP: activity_three_sibling_fields.cpp (+ activity_member_struct_array_element, activity_local_class_two_struct_members); global-routed FN: a real activity whose configured sources are seeded into a static message buffer through constant-GEP stores, then copied field-by-field into the analysed object (we can extract a standalone .ll + taint-config on request).
Build / toolchain compatibility
cmake/add_llvm.cmake can't find a stock LLVM ≥ 18.find_package(LLVM ${PHASAR_LLVM_VERSION} ...) triggers LLVM 18+'s exact major.minor.patch version check, rejecting e.g. a distro 18.1.x. We drop the version arg and rely on -DLLVM_DIR. Severity: build break (Fedora 44 + Ubuntu 24.04). → https://gitlab.com/exida-oss/stig/phasar/-/commit/f1e597a91053d482d204b31a3e4f403cd2320a2e
One more we observed but have not patched (FYI / repro for you)
SIGSEGV in the CFLFieldSens IFDS solver (IterativeIDESolver<CFLFieldSensIFDSProblem>::storeResultsAndPropagate → DenseMapBase::FindAndConstruct) on a nested-class-via-helper pattern, with the SVF demand-driven alias backend. We don't have a fix — flagging it in case it's useful. Crash stack + the triggering fixture: nested_class_via_helper_function.cpp (--solver=ifds-fieldsens-taint --alias-analysis=svf-dda). Happy to attach the full core/backtrace.
On tests
Your contributing guide asks each patch to ship a small unit test. To address this, we refer to our test cases - you are free to take them over.
On the mechanics (GitLab → GitHub)
We developed these on a GitLab fork (gitlab.com/exida-oss/stig/phasar) because that's where our toolchain pins live. GitHub PRs need the source branch on GitHub, so a GitLab→GitHub PR isn't directly possible — when you're ready for us to open PRs we'll push our stig-patches-2026.06.
Reproducing
The linked .cpp fixtures are in our (exida) repo - they are listed next to each patch summary above. Feel free to integrate them.
Other info
Here is matrix accuracy ranking, maybe it is interesting for you (this shows the results with the patches applied):
> Task :stig-gradle-plugin:showMatrixRanking
Matrix accuracy ranking: /home/pserwa/Documents/public/gitlab.com/exida-oss/stig/stig/stig-gradle-plugin/build/reports/taint-matrix-ranking.txt
------------------------------------------------------------------------
StigLlvmToolchain — single source of truth for stig's clang/LLVM flags------------------------------------------------------------------------ LLVM major (Job 2/3, pinned) : 18 clang++ (Job 3, user code → IR): clang++-18 llvm-link / llvm-dis / llvm-as : llvm-link-18 / llvm-dis-18 / llvm-as-18 opt / llvm-extract / llvm-nm : opt-18 / llvm-extract-18 / llvm-nm-18 llvm-cxxfilt : llvm-cxxfilt-18 clang-tidy : clang-tidy-18 C++ standard (CXX_STANDARD) : c++14 Opt level (OPT_LEVEL) : -O2 Debug-info flag : -g Preserve SSA value names : -fno-discard-value-names Inliner threshold override : -mllvm -inline-threshold=100000 Disabled passes : -fno-slp-vectorize Assembled analysis-compile flags (analysisFlags()): -std=c++14 -O2 -g -fno-discard-value-names -mllvm -inline-threshold=100000 -fno-slp-vectorize--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------PhASAR mode-combo ranking — TaintMatrixTest. Four truth axes, each a confusion matrix(% = (TP+TN)/total; TP/FN/TN/FP), tallied per (scenario × combo) cell. Sorted by LEAK %. LEAK — taint leak verdict. positive = sink leaked. qm should leak (TP/FN), asilb should not (TN/FP). Z3 — Z3 path-feasibility PRUNING of a leaked qm sink (ide-xtaint only). positive = leak kept (verdict ≠ "no"). normal leak: "yes"=TP / else FN. infeasible fixture: "no"=TN / else FP. "unknown" counts as did-NOT-prune. So the qmSinksInfeasible cases (real-leak + Z3-infeasible) surface as FP — the honest "Z3 returns Unknown, prunes nothing" signal, not a free TN. DATAFLOW — sink's DAG traces back to a qm source, walked in Kotlin (ide-xtaint only). qm should (TP/FN),
asilb should not (TN/FP). FN here can also mean PhASAR omitted the source call from the DAG.
CFG — pure ICFG reachability FROM THE ENTRY POINT for every declared sink+source. positive =
reachable; all are expected reachable, so TP=reachable / FN=unreachable (TN/FP stay 0).
TIME — wall-clock in analyzer subprocesses across all cells for the combo.
--------------------------------------------------------------------------------------------------------------------------------------------
LEAK Z3 DATAFLOW CFG TIME combo
% TP FN TN FP % TP FN TN FP % TP FN TN FP % TP FN TN FP
93.3 62 6 63 3 90.3 56 0 0 6 91.8 60 8 63 3 100.0 207 0 1 0 1.9s ide-xtaint × union-find (production default — IDE solver + ctx-ind-sens)
91.8 61 7 62 4 90.2 55 0 0 6 90.3 59 9 62 4 100.0 207 0 1 0 1.9s ide-xtaint × cflanders (IDE solver + cflanders)
91.8 60 8 63 3 90.0 54 0 0 6 90.3 58 10 63 3 100.0 207 0 1 0 2.8s ide-xtaint × svf-dda (IDE solver + SVF demand-driven)
91.8 60 8 63 3 90.0 54 0 0 6 90.3 58 10 63 3 100.0 207 0 1 0 2.8s ide-xtaint × svf-vfs (IDE solver + SVF flow/ctx-sens)
86.6 56 12 60 6 n/a n/a 100.0 207 0 1 0 2.6s backend=svf × union-find (Phase B — real SrcSnkDDA)
86.6 56 12 60 6 n/a n/a 100.0 207 0 1 0 2.6s backend=svf × cflanders (Phase B — real SrcSnkDDA)
83.6 51 17 61 5 n/a n/a 100.0 207 0 1 0 0.9s ifds-fieldsens-taint × union-find (matrix-top, NOT production)
82.8 49 19 62 4 n/a n/a 100.0 207 0 1 0 0.9s ifds-fieldsens-taint × cflanders
82.8 61 7 50 16 n/a n/a 100.0 207 0 1 0 0.9s ifds-taint × union-find (field-INsens + ctx-ind-sens alias)
81.3 59 9 50 16 n/a n/a 100.0 207 0 1 0 1.8s ifds-taint × svf-dda (field-INsens + SVF demand-driven)
81.3 59 9 50 16 n/a n/a 100.0 207 0 1 0 1.8s ifds-taint × svf-vfs (field-INsens + SVF flow/ctx-sens)
--------------------------------------------------------------------------------------------------------------------------------------------
ide-xtaint × union-find (production default — IDE solver + ctx-ind-sens):
[xfail] arithmetic_merge: FN missing [sink_qm]
[xfail] validator_returns_comparison: FN missing [sink_qm_via_bool]
[xfail] validator_returns_passthrough: FN missing [sink_qm_via_bool]
[xfail] validator_writes_via_pointer: FN missing [sink_qm_via_bool]
[xfail] member_function_pointer: FN missing [sink_qm]
[xfail] fptr_multi_targets: FN missing [sink_propagate_a]
[xfail] virtual_operator_apply: FP extra [sink_drop_b]
[xfail] virtual_inheritance_converter: FP extra [sink_drop_b]
[xfail] constant_branch_path_disambiguation: FP extra [sink_drop_b]
As you see, there are still some bugs with ide-xtaint × union-find, probably I will still have some time to investigate this.
Several build-compat + taint/path-sensitivity fixes (we have patches, happy to PR)
Hi Fabian, and the PhASAR team! 👋
I am using PhASAR as the taint/dataflow engine inside a larger C++ analysis toolchain (we drive the C++ API directly:
IDEExtendedTaintAnalysis, theZ3BasedPathSensitivityManager, the UnionFind alias backends, etc., from our own taint-config JSON). It's been genuinely great to build on — thank you for the work you've put into it.I am working at exida. We at exida believe in open source and collaboration, so we'd love to give back: along the way we hit a handful of issues, fixed them on a public fork, and would like to upstream them and we have actually no interest to maintain a parallel/downstream fork. One issue listing them all seemed the easiset way to start the conversation — happy to split into individual PRs in whatever shape works best for you.
development@5ca66dc55("WPDS Solver", #839).clang-22building PhASAR, LLVM 18 as the linked library, GCC-16libstdc++headers. (Most of the build-break + the clang-22 ICE + the GCC-16 errors are here.)clang-18, LLVM 18. (The LLVM-versionfind_packagebreak and the correctness/crash issues reproduce here too.)Each commit message is written as a standalone bug report (symptom + cause + fix + how to trigger). Quick index, with a minimal reproducer from our test suite where we have one:
Correctness / robustness (the ones we'd most love your eyes on)
Path-feasibility aborts the whole run on a non-encodable branch guard.
report_fatal_error/ an ill-sorted Z3 term (Z3_mk_and→ast_manager::check_args) when a path to a sink is guarded by a float / pointer / bitwise condition. Uncatchable as az3::exception, so one such edge kills the entire--emit-feasibilityrun. We added a conservative default-deny pre-pass that only encodes the Int/Bool fragment and drops other edges (sound / FN-safe).Severity: high (crash).
Fix → https://gitlab.com/exida-oss/stig/phasar/-/commit/825d36d20580e880285dad924bc6287744ddfa6d
Triggers →
z3_infeasible_float_guard.cpp(also_pointer_guard/_bitwise_parityin the same dir)Path-feasibility aborts on an unresolved-pointer load.
getLoadInstAsZ3callsreport_fatal_error("unsupported load!")for a load whose pointer isn't a direct alloca/GEP (common on optimized IR, e.g. a rematerialized*(ptr+off)). We model it as a fresh unconstrained term instead (the same havoc already used for unhandled types).Severity: high (crash).
Fix → https://gitlab.com/exida-oss/stig/phasar/-/commit/9c40b0c4f753f8b3d13dcf4426553d0dc8d2b208
(No minimal fixture — we hit this on a real optimized activity, the
*(ptr+off)opaque-pointer shape; happy to extract a.ll.)IDEExtendedTaintAnalysisdrops cross-fieldthiswrites at the return edge (false negative). The return flow-function uses a strictequivalent, which filters every exit fact whose first offset isn't 0 — so taint a callee writes intothis->memberat a non-zero offset never reaches the caller. The call edge already tolerates this viaequivalentExceptPointerArithmetics; we mirror that outbound.Severity: high for absence-of-taint use (missed real flow).
Fix → https://gitlab.com/exida-oss/stig/phasar/-/commit/500830dd4ed4a40a279bcf0cf7f325fa38e605fe
Triggers →
activity_cross_method_member.cppIDEExtendedTaintAnalysismisses sinks across optimizer-rematerialized indirection (false negative). When the optimizer inserts an extra indirection element so the fact is one level deeper than the sink leaf, the strict sink-match misses the leak. Our leaf-aware match requires base + leading offset + leaf to all match (and avoids the sibling-field over-approximation a naive prefix relax causes).Severity: recall.
Fix → https://gitlab.com/exida-oss/stig/phasar/-/commit/a921a2355b7f4c55796d509fc727b65939eb41f2
Triggers →
activity_member_struct_pointer_indirection.cppIDEExtendedTaintAnalysisstore flow-function conflates sibling scalar fields (false positive) — RESOLVED, two merge-ready commits. IngetNormalFlowFunction, the store-FF tested the incoming fact against the stored value withequivalentExceptPointerArithmetics(Source, PALevel=1). For a scalar stored value the source has a single-element access path ({base,[0]}), soMinSize==1hits theMinSize <= PALevelearly-out and the predicate returns true on base match alone, ignoring the field offset — taint ons.qmbleeds onto as.asilb = …store of the same base object (FP across siblings). A naivePALevel=0over-corrects into FNs (exact-offset matching also rejects legitimate copies one element deeper than the source leaf). The right shape — and what we landed — is a leaf-aware byte-offset store gate: field identity = the cumulative byte offset from the base (summed over the access-path), base-equal, applied to scalar stores only (pointer/vector stores keep the existingPALevelbehaviour). This kills the sibling FP without the over-correction and mirrors Rename command-line options #4's leaf-aware sink-match.Landing that gate then exposed a separate false negative upstream of it, in
AbstractMemoryLocationFactory::createImpl: aConstantExprGEP (getelementptr(@g, …)) is anllvm::Constant, so it hit the self-define guard and seeded{the-gep-constant, [0]}— the field byte discarded — while a load of the same global field decomposed via theGEPOperatorwalk to{@g, [byte]}. Different base and offset, so the precise gate (correctly) dropped the taint, and any source routed through a global field (e.g. a static message buffer copied field-by-field into a member) was lost. Fix: exclude GEP-shaped constants from the self-define guard so they fall through to the existingGEPOperatorbranch and decompose like instruction GEPs; bare globals (Constant, not GEPOperator) still self-define. This makes global-routed sources field-sensitive end to end. (Two independent reviewers converged on this same cause from the IR + anXSTOREbyte-offset trace.)Severity: precision (sibling FP) + recall (global-routed FN). Both resolved; on a real activity, data-flow recall went 2/9 → 9/9 with the synthetic sibling-FP suite staying clean.
Fixes →
XSTOREdebug dump behindSTIG_STORE_DEBUG: https://gitlab.com/exida-oss/stig/phasar/-/commit/0ddc8de64bc49098404cc78f770328e3d12c8866Triggers — sibling FP:
activity_three_sibling_fields.cpp(+activity_member_struct_array_element,activity_local_class_two_struct_members); global-routed FN: a real activity whose configured sources are seeded into a static message buffer through constant-GEP stores, then copied field-by-field into the analysed object (we can extract a standalone.ll+ taint-config on request).Build / toolchain compatibility
cmake/add_llvm.cmakecan't find a stock LLVM ≥ 18.find_package(LLVM ${PHASAR_LLVM_VERSION} ...)triggers LLVM 18+'s exact major.minor.patch version check, rejecting e.g. a distro 18.1.x. We drop the version arg and rely on-DLLVM_DIR.Severity: build break (Fedora 44 + Ubuntu 24.04). → https://gitlab.com/exida-oss/stig/phasar/-/commit/f1e597a91053d482d204b31a3e4f403cd2320a2e
Same trap in the installed
phasarConfig.cmake(generated fromConfig.cmake.in) for downstreamfind_package(phasar), plus a Z3 module-vs-CONFIG target-name mismatch (z3vsz3::libz3). Fixed at the template.Severity: downstream build break. → https://gitlab.com/exida-oss/stig/phasar/-/commit/08e605b995f03fd9dd319ff0c29f11e249181a07
clang ≥ 22 ICE (mangler segfault) on the CTAD form
pag::PBMixin{...}inLLVMUnionFindAA. Spelling the template args explicitly avoids the synthesized deduction guide.Severity: build break (compiler ICE, Fedora 44 / clang-22). → https://gitlab.com/exida-oss/stig/phasar/-/commit/d7c131e60ceb89e3c5d5175730c78cd1c8697e15
GCC 16 hard errors on
__has_cpp_attribute([[lifetimebound]])(Macros.h) and stale member namesr/c/vinTable<R,C,V>::Cell::operator<<(Table.h).Severity: build break (Fedora 44 / GCC-16 libstdc++). → https://gitlab.com/exida-oss/stig/phasar/-/commit/e1ad35997eff7d09d6496047dc8c2ab10c3b53fd
One more we observed but have not patched (FYI / repro for you)
CFLFieldSensIFDS solver (IterativeIDESolver<CFLFieldSensIFDSProblem>::storeResultsAndPropagate→DenseMapBase::FindAndConstruct) on a nested-class-via-helper pattern, with the SVF demand-driven alias backend. We don't have a fix — flagging it in case it's useful. Crash stack + the triggering fixture:nested_class_via_helper_function.cpp(--solver=ifds-fieldsens-taint --alias-analysis=svf-dda). Happy to attach the full core/backtrace.On tests
Your contributing guide asks each patch to ship a small unit test. To address this, we refer to our test cases - you are free to take them over.
On the mechanics (GitLab → GitHub)
We developed these on a GitLab fork (
gitlab.com/exida-oss/stig/phasar) because that's where our toolchain pins live. GitHub PRs need the source branch on GitHub, so a GitLab→GitHub PR isn't directly possible — when you're ready for us to open PRs we'll push ourstig-patches-2026.06.Reproducing
The linked
.cppfixtures are in our (exida) repo - they are listed next to each patch summary above. Feel free to integrate them.Other info
Here is matrix accuracy ranking, maybe it is interesting for you (this shows the results with the patches applied):
As you see, there are still some bugs with ide-xtaint × union-find, probably I will still have some time to investigate this.
Regards,
Piotr
==
Piotr Serwa
Principal Safety Engineer
exida