Skip to content

Latest commit

 

History

History
415 lines (294 loc) · 12.3 KB

File metadata and controls

415 lines (294 loc) · 12.3 KB

WhyFlow Artifact: Experiment Reproduction Guide

This document provides step-by-step instructions for reproducing the experiments and analysis from our ICSE 2026 paper "WhyFlow: Interrogative Debugger for Sensemaking Taint Analysis."

Overview

We provide two reproduction tracks:

Track Description Requirements
Track A User study analysis only Python 3.8+
Track B Running WhyFlow Docker (recommended) or Meteor

Track A: User Study Analysis Only

This track regenerates all statistical results, tables, and figures from the user study without running WhyFlow.

Requirements

  • Python 3.8 or higher
  • pip (Python package manager)

Setup

  1. Install Python dependencies:
pip install pandas numpy scipy matplotlib statsmodels jupyter

Usage

  1. Run statistical analysis:
cd statistical_tests
python3 statistical_tests.py

This script performs:

  • Mann-Whitney U tests comparing WhyFlow vs. CodeQL accuracy
  • NASA-TLX cognitive load analysis
  • Confidence and ease-of-use comparisons
  • Cohen's d effect size calculations
  1. Generate figures:
cd data
jupyter notebook plots.ipynb

Run all cells to regenerate:

  • Figure 9: NASA-TLX ratings comparison
  • Table 4: Accuracy results by question
  • Table 5: Confidence and ease-of-use ratings

Expected Outputs

After running statistical_tests.py, you should see:

=== Accuracy Analysis ===
WhyFlow correct answers: 92%
CodeQL correct answers: 71%
Mann-Whitney U p-value: < 0.05
Cohen's d: 1.33 (large effect)

=== NASA-TLX Analysis ===
Mental Demand - WhyFlow: 3.25, CodeQL: 5.92
p-value: < 0.05, Cohen's d: 2.1 (large effect)
...

=== Confidence & Ease of Use ===
Confidence - WhyFlow: 4.25/5, CodeQL: 2.08/5
Ease of Use - WhyFlow: 4.25/5, CodeQL: 1.92/5

Output files:

  • statistical_test_results.csv - Accuracy statistics
  • nasa_tlx_statistical_test_results.csv - Cognitive load results
  • confidence_ease_statistical_test_results.csv - Usability results

Track B: Running WhyFlow

This track demonstrates WhyFlow's interrogative debugging capabilities on pre-analyzed Apache Dubbo taint analysis results.

Option 1: Docker (Recommended)

Requirements: Docker 20.10+ (or Docker Desktop 4.x+)

  1. Build the Docker image (from repository root):
# Make sure you are in the WhyFlow repository root directory
docker build -t whyflow .

Note: The first build compiles Soufflé from source for cross-platform compatibility and downloads Meteor dependencies.

  1. Run the container:
docker run -p 3000:3000 whyflow
  1. Access WhyFlow:

Open your browser to http://localhost:3000

Option 2: Native Installation

Requirements:

  • macOS or Linux
  • Node.js 18+
  • Meteor 3.x (recommended) or 2.13+
  • Soufflé 2.4+ (for running Datalog queries)
  1. Install Meteor:
# macOS/Linux
curl https://install.meteor.com/ | sh
  1. Install Soufflé (required for Datalog queries):
# macOS (Homebrew)
brew install souffle-lang/souffle/souffle

# Ubuntu/Debian - build from source
# See: https://souffle-lang.github.io/build
  1. Install dependencies:
# Root-level dependencies
npm install

# Meteor app dependencies
cd taint_debug_app/taint_debug
meteor npm install
  1. Run WhyFlow:
cd taint_debug_app/taint_debug
meteor run
  1. Access WhyFlow:

Open your browser to http://localhost:3000

Using WhyFlow

Once WhyFlow is running, you can explore the six template queries:

1. WhyFlow Query

  • Question: "Why is there a taint flow from source X to sink Y?"
  • Steps:
    1. Select "WhyFlow" from the Query Options panel
    2. Choose a source from the D-SRC dropdown
    3. Choose a sink from the D-SNK dropdown
    4. Click "Run Query"
  • Expected: Graph showing the taint path with intermediate third-party APIs highlighted in orange

2. WhyNotFlow Query

  • Question: "Why is there no taint flow from source X to sink Y?"
  • Steps:
    1. Select "WhyNotFlow" from the Query Options panel
    2. Choose a source and sink that should have a flow but don't
    3. Click "Run Query"
  • Expected: Graph showing plausible paths with dashed edges indicating where sanitizers block the flow

3. AffectedSinks Query

  • Question: "If we alter a third-party library's model, which sinks are affected?"
  • Steps:
    1. Select "AffectedSinks" from the Query Options panel
    2. Choose a source from D-SRC
    3. Choose a third-party API from D-API dropdown
    4. Click "Run Query"
  • Expected: List of sinks that would become unreachable if the API were modeled as a sanitizer

4. DivergentSinks Query

  • Question: "Which third-party library model could influence multiple flows from the same source?"
  • Steps: Select query, choose source, and run
  • Expected: Common API node that affects multiple sink paths

5. DivergentSources Query

  • Question: "Which third-party library model could influence multiple flows to the same sink?"
  • Steps: Select query, choose sink, and run
  • Expected: Common API node where multiple source paths converge

6. GlobalImpact Query

  • Question: "Which third-party library model has the largest global influence?"
  • Steps: Select query, choose source and sink, and run
  • Expected: APIs ranked by frequency of appearance across all paths (node size reflects impact score)

Sample Queries to Try

Here are concrete example queries you can run to explore WhyFlow's capabilities on the Apache Dubbo dataset:


Example 1: WhyFlow Query

Question: Why is there a taint flow from source (21) to sink (9636)?

Parameter Value
Query Type WhyFlow
Source (D-SRC) (21) javax.servlet.http.HttpServletRequest.getRequestURI(...)
Sink (D-SNK) (9636) java.lang.CharSequence.charAt(...)

Goal: Understand why there are data flows from X to Y by identifying intermediate API pass-through points that contribute to the existence of the flow.

Expected Answer: java.lang.String.substring(int beginIndex, int endIndex)


Example 2: WhyNotFlow Query

Question: Why are there NO dataflows from source (15) to sink (9817)?

Parameter Value
Query Type WhyNotFlow
Source (D-SRC) (15) msg : Object at TripleHttp2FrameServerHandler.java:69
Sink (D-SNK) (9817) true at TriDecoder.java:55

Goal: Understand why there are NO data flows from X to Y by identifying which third-party library APIs serve as sanitizers that kill the data flow. If that API were modified to permit the data flow, would sink Y become reachable?

Expected Answer: io.netty.handler.codec.http2.Http2DataFrame.content()


Example 3: AffectedSinks Query

Question: What sinks become unreachable if we make API (9860) a sanitizer?

Parameter Value
Query Type AffectedSinks
Source (D-SRC) (2) msg : HttpRequest at HttpProcessHandler.java:88
API (D-API) (9860) io.netty.handler.codec.http.HttpRequest.uri()

Goal: Identify the set of sinks that would no longer be reachable if the selected API were modeled as a sanitizer. Select all applicable sinks from the results.

Expected Answer:

  • valueList at HttpCommandDecoder.java:55
  • msg at Log4jLogger.java:81

Example 4: DivergentSinks Query

Question: What is the LAST common pass-through node for flows from source (14) to two different sinks?

Parameter Value
Query Type DivergentSinks
Source (D-SRC) (14) msg : Http2StreamFrame at TripleHttp2ClientResponseHandler
Sink 1 (D-SNK) (9803) path at TriplePathResolver.java:41
Sink 2 (D-SNK2) (9804) path at TriplePathResolver.java:46

Goal: Find the last common pass-through node where flows from a single source diverge to reach different sinks.

Expected Answer: toString(...) : String at TripleServerStream.java:359


Example 5: DivergentSources Query

Question: What is the FIRST common pass-through node for flows from two different sources to the same sink?

Parameter Value
Query Type DivergentSources
Source 1 (D-SRC) (6) input : ByteBuf at NettyCodecAdapter.java:94
Source 2 (D-SRC2) (7) in : ByteBuf at NettyPortUnificationServerHandler.java:106
Sink (D-SNK) (9761) buffer at NettyBackedChannelBuffer.java:201

Goal: Find the first common pass-through node where flows from multiple sources converge before reaching the same sink.

Expected Answer: parameter this : NettyBackedChannelBuffer [buffer] : ByteBuf at NettyBackedChannelBuffer.java:200


Example 6: GlobalImpact Query

Question: Which third-party APIs appear most frequently in taint flows between source (15) and sink (9803)?

Parameter Value
Query Type GlobalImpact
Source (D-SRC) (15) msg : Object at TripleHttp2FrameServerHandler.java:69
Sink (D-SNK) (9803) path at TriplePathResolver.java:41

Expected Answer (ranked by frequency):

  1. io.netty.handler.codec.http2.Http2HeadersFrame.headers()
  2. java.lang.CharSequence.toString()
  3. io.netty.handler.codec.http2.Http2Headers.path()

Goal: Rank third-party library APIs by how frequently they appear across all taint flows, helping identify high-impact APIs whose models have the largest influence on analysis results.


Graph Visualization Features

  • Color coding:
    • Green nodes = Sources
    • Red nodes = Sinks
    • Orange nodes = Third-party API calls
    • Blue nodes = Other intermediate nodes
  • Edge types:
    • Solid edges = Active taint flows
    • Dashed edges = Plausible flows (currently blocked)
  • Interactions:
    • Click node to navigate to source code
    • Hover for fully-qualified names
    • Zoom/pan to explore large graphs

Verification Checklist

Track A Verification

  • statistical_tests.py runs without errors
  • Output shows ~92% accuracy for WhyFlow vs ~71% for CodeQL
  • NASA-TLX mental demand shows significant reduction
  • All p-values are < 0.05
  • Effect sizes (Cohen's d) are > 0.8

Track B Verification

  • WhyFlow loads at http://localhost:3000
  • Query Options panel displays six query types
  • Source/sink dropdowns populate with Apache Dubbo data
  • Running WhyFlow query produces graph visualization
  • Soufflé queries execute successfully (check container logs for query output)
  • Nodes are color-coded (green sources, red sinks, orange APIs)
  • Clicking nodes shows code location information

Troubleshooting

Docker Issues

Problem: Container fails to start

# Check Docker is running
docker info

# View container logs
docker logs <container_id>

Problem: Port 3000 already in use

docker run -p 3001:3000 whyflow
# Then access http://localhost:3001

Meteor Issues

Problem: Meteor command not found

# Ensure Meteor is in PATH
export PATH=$PATH:$HOME/.meteor

Problem: MongoDB connection errors

# Reset Meteor's local database
cd taint_debug_app/taint_debug
meteor reset
meteor run

Python Issues

Problem: Import errors for pandas/scipy

pip install --upgrade pandas numpy scipy matplotlib

Data Files Reference

File Description
statistical_tests/*.csv User study questionnaire responses
data/whyflow.csv WhyFlow accuracy results by question
data/codeql.csv CodeQL accuracy results by question
data/plots.ipynb Jupyter notebook for figure generation
Subject_Prog_CodeQL_Taint/*.json CodeQL taint analysis results
taint_debug_app/analysis_files/ Processed Soufflé facts
taint_debug_app/app_souffle_queries/ Datalog query implementations

Contact

For questions or issues: