This document provides step-by-step instructions for reproducing the experiments and analysis from our ICSE 2026 paper "WhyFlow: Interrogative Debugger for Sensemaking Taint Analysis."
We provide two reproduction tracks:
| Track | Description | Requirements |
|---|---|---|
| Track A | User study analysis only | Python 3.8+ |
| Track B | Running WhyFlow | Docker (recommended) or Meteor |
This track regenerates all statistical results, tables, and figures from the user study without running WhyFlow.
- Python 3.8 or higher
- pip (Python package manager)
- Install Python dependencies:
pip install pandas numpy scipy matplotlib statsmodels jupyter- Run statistical analysis:
cd statistical_tests
python3 statistical_tests.pyThis script performs:
- Mann-Whitney U tests comparing WhyFlow vs. CodeQL accuracy
- NASA-TLX cognitive load analysis
- Confidence and ease-of-use comparisons
- Cohen's d effect size calculations
- Generate figures:
cd data
jupyter notebook plots.ipynbRun all cells to regenerate:
- Figure 9: NASA-TLX ratings comparison
- Table 4: Accuracy results by question
- Table 5: Confidence and ease-of-use ratings
After running statistical_tests.py, you should see:
=== Accuracy Analysis ===
WhyFlow correct answers: 92%
CodeQL correct answers: 71%
Mann-Whitney U p-value: < 0.05
Cohen's d: 1.33 (large effect)
=== NASA-TLX Analysis ===
Mental Demand - WhyFlow: 3.25, CodeQL: 5.92
p-value: < 0.05, Cohen's d: 2.1 (large effect)
...
=== Confidence & Ease of Use ===
Confidence - WhyFlow: 4.25/5, CodeQL: 2.08/5
Ease of Use - WhyFlow: 4.25/5, CodeQL: 1.92/5
Output files:
statistical_test_results.csv- Accuracy statisticsnasa_tlx_statistical_test_results.csv- Cognitive load resultsconfidence_ease_statistical_test_results.csv- Usability results
This track demonstrates WhyFlow's interrogative debugging capabilities on pre-analyzed Apache Dubbo taint analysis results.
Requirements: Docker 20.10+ (or Docker Desktop 4.x+)
- Build the Docker image (from repository root):
# Make sure you are in the WhyFlow repository root directory
docker build -t whyflow .Note: The first build compiles Soufflé from source for cross-platform compatibility and downloads Meteor dependencies.
- Run the container:
docker run -p 3000:3000 whyflow- Access WhyFlow:
Open your browser to http://localhost:3000
Requirements:
- macOS or Linux
- Node.js 18+
- Meteor 3.x (recommended) or 2.13+
- Soufflé 2.4+ (for running Datalog queries)
- Install Meteor:
# macOS/Linux
curl https://install.meteor.com/ | sh- Install Soufflé (required for Datalog queries):
# macOS (Homebrew)
brew install souffle-lang/souffle/souffle
# Ubuntu/Debian - build from source
# See: https://souffle-lang.github.io/build- Install dependencies:
# Root-level dependencies
npm install
# Meteor app dependencies
cd taint_debug_app/taint_debug
meteor npm install- Run WhyFlow:
cd taint_debug_app/taint_debug
meteor run- Access WhyFlow:
Open your browser to http://localhost:3000
Once WhyFlow is running, you can explore the six template queries:
- Question: "Why is there a taint flow from source X to sink Y?"
- Steps:
- Select "WhyFlow" from the Query Options panel
- Choose a source from the D-SRC dropdown
- Choose a sink from the D-SNK dropdown
- Click "Run Query"
- Expected: Graph showing the taint path with intermediate third-party APIs highlighted in orange
- Question: "Why is there no taint flow from source X to sink Y?"
- Steps:
- Select "WhyNotFlow" from the Query Options panel
- Choose a source and sink that should have a flow but don't
- Click "Run Query"
- Expected: Graph showing plausible paths with dashed edges indicating where sanitizers block the flow
- Question: "If we alter a third-party library's model, which sinks are affected?"
- Steps:
- Select "AffectedSinks" from the Query Options panel
- Choose a source from D-SRC
- Choose a third-party API from D-API dropdown
- Click "Run Query"
- Expected: List of sinks that would become unreachable if the API were modeled as a sanitizer
- Question: "Which third-party library model could influence multiple flows from the same source?"
- Steps: Select query, choose source, and run
- Expected: Common API node that affects multiple sink paths
- Question: "Which third-party library model could influence multiple flows to the same sink?"
- Steps: Select query, choose sink, and run
- Expected: Common API node where multiple source paths converge
- Question: "Which third-party library model has the largest global influence?"
- Steps: Select query, choose source and sink, and run
- Expected: APIs ranked by frequency of appearance across all paths (node size reflects impact score)
Here are concrete example queries you can run to explore WhyFlow's capabilities on the Apache Dubbo dataset:
Question: Why is there a taint flow from source (21) to sink (9636)?
| Parameter | Value |
|---|---|
| Query Type | WhyFlow |
| Source (D-SRC) | (21) javax.servlet.http.HttpServletRequest.getRequestURI(...) |
| Sink (D-SNK) | (9636) java.lang.CharSequence.charAt(...) |
Goal: Understand why there are data flows from X to Y by identifying intermediate API pass-through points that contribute to the existence of the flow.
Expected Answer: java.lang.String.substring(int beginIndex, int endIndex)
Question: Why are there NO dataflows from source (15) to sink (9817)?
| Parameter | Value |
|---|---|
| Query Type | WhyNotFlow |
| Source (D-SRC) | (15) msg : Object at TripleHttp2FrameServerHandler.java:69 |
| Sink (D-SNK) | (9817) true at TriDecoder.java:55 |
Goal: Understand why there are NO data flows from X to Y by identifying which third-party library APIs serve as sanitizers that kill the data flow. If that API were modified to permit the data flow, would sink Y become reachable?
Expected Answer: io.netty.handler.codec.http2.Http2DataFrame.content()
Question: What sinks become unreachable if we make API (9860) a sanitizer?
| Parameter | Value |
|---|---|
| Query Type | AffectedSinks |
| Source (D-SRC) | (2) msg : HttpRequest at HttpProcessHandler.java:88 |
| API (D-API) | (9860) io.netty.handler.codec.http.HttpRequest.uri() |
Goal: Identify the set of sinks that would no longer be reachable if the selected API were modeled as a sanitizer. Select all applicable sinks from the results.
Expected Answer:
valueListatHttpCommandDecoder.java:55msgatLog4jLogger.java:81
Question: What is the LAST common pass-through node for flows from source (14) to two different sinks?
| Parameter | Value |
|---|---|
| Query Type | DivergentSinks |
| Source (D-SRC) | (14) msg : Http2StreamFrame at TripleHttp2ClientResponseHandler |
| Sink 1 (D-SNK) | (9803) path at TriplePathResolver.java:41 |
| Sink 2 (D-SNK2) | (9804) path at TriplePathResolver.java:46 |
Goal: Find the last common pass-through node where flows from a single source diverge to reach different sinks.
Expected Answer: toString(...) : String at TripleServerStream.java:359
Question: What is the FIRST common pass-through node for flows from two different sources to the same sink?
| Parameter | Value |
|---|---|
| Query Type | DivergentSources |
| Source 1 (D-SRC) | (6) input : ByteBuf at NettyCodecAdapter.java:94 |
| Source 2 (D-SRC2) | (7) in : ByteBuf at NettyPortUnificationServerHandler.java:106 |
| Sink (D-SNK) | (9761) buffer at NettyBackedChannelBuffer.java:201 |
Goal: Find the first common pass-through node where flows from multiple sources converge before reaching the same sink.
Expected Answer: parameter this : NettyBackedChannelBuffer [buffer] : ByteBuf at NettyBackedChannelBuffer.java:200
Question: Which third-party APIs appear most frequently in taint flows between source (15) and sink (9803)?
| Parameter | Value |
|---|---|
| Query Type | GlobalImpact |
| Source (D-SRC) | (15) msg : Object at TripleHttp2FrameServerHandler.java:69 |
| Sink (D-SNK) | (9803) path at TriplePathResolver.java:41 |
Expected Answer (ranked by frequency):
io.netty.handler.codec.http2.Http2HeadersFrame.headers()java.lang.CharSequence.toString()io.netty.handler.codec.http2.Http2Headers.path()
Goal: Rank third-party library APIs by how frequently they appear across all taint flows, helping identify high-impact APIs whose models have the largest influence on analysis results.
- Color coding:
- Green nodes = Sources
- Red nodes = Sinks
- Orange nodes = Third-party API calls
- Blue nodes = Other intermediate nodes
- Edge types:
- Solid edges = Active taint flows
- Dashed edges = Plausible flows (currently blocked)
- Interactions:
- Click node to navigate to source code
- Hover for fully-qualified names
- Zoom/pan to explore large graphs
-
statistical_tests.pyruns without errors - Output shows ~92% accuracy for WhyFlow vs ~71% for CodeQL
- NASA-TLX mental demand shows significant reduction
- All p-values are < 0.05
- Effect sizes (Cohen's d) are > 0.8
- WhyFlow loads at http://localhost:3000
- Query Options panel displays six query types
- Source/sink dropdowns populate with Apache Dubbo data
- Running WhyFlow query produces graph visualization
- Soufflé queries execute successfully (check container logs for query output)
- Nodes are color-coded (green sources, red sinks, orange APIs)
- Clicking nodes shows code location information
Problem: Container fails to start
# Check Docker is running
docker info
# View container logs
docker logs <container_id>Problem: Port 3000 already in use
docker run -p 3001:3000 whyflow
# Then access http://localhost:3001Problem: Meteor command not found
# Ensure Meteor is in PATH
export PATH=$PATH:$HOME/.meteorProblem: MongoDB connection errors
# Reset Meteor's local database
cd taint_debug_app/taint_debug
meteor reset
meteor runProblem: Import errors for pandas/scipy
pip install --upgrade pandas numpy scipy matplotlib| File | Description |
|---|---|
statistical_tests/*.csv |
User study questionnaire responses |
data/whyflow.csv |
WhyFlow accuracy results by question |
data/codeql.csv |
CodeQL accuracy results by question |
data/plots.ipynb |
Jupyter notebook for figure generation |
Subject_Prog_CodeQL_Taint/*.json |
CodeQL taint analysis results |
taint_debug_app/analysis_files/ |
Processed Soufflé facts |
taint_debug_app/app_souffle_queries/ |
Datalog query implementations |
For questions or issues:
- Open an issue on GitHub: https://github.com/UCLA-SEAL/WhyFlow
- Email: burak@cs.ucla.edu