Skip to content

Commit 2c63d85

Browse files
authored
chore(bench): add javac compilation step to Java fixture for javacg-static comparison (#1332)
* chore(bench): add javac fixture compilation and javacg-static comparison script (#1307) Closes #1307 * fix(bench): purge benchmark/ before recompile to avoid stale class files Incremental make runs without a clean could silently bundle old .class files from deleted .java sources into fixture.jar. Deleting benchmark/ before javac guarantees a clean artefact on every rebuild. * fix(bench): error immediately when --jar is given without a path When --jar is the last argument, args[jarArgIdx + 1] is undefined and findJavacgJar silently falls through to the env-var/lib lookups, giving a confusing "JAR not found" message instead of flagging the bad invocation. Fail fast with a clear diagnostic. * fix(bench): anchor buildClassFileMap regex to line start to avoid false Javadoc matches * fix(bench): reject flag-like strings as --jar path argument (#1332)
1 parent f0db64c commit 2c63d85

4 files changed

Lines changed: 362 additions & 5 deletions

File tree

docs/benchmarks/RESOLUTION-COMPARISON.md

Lines changed: 48 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -177,10 +177,35 @@ Established Java static call graph tools all require compiled bytecode:
177177
| [Soot](https://github.com/soot-oss/soot) | CHA / RTA / VTA / Spark | Needs compiled `.class` files |
178178
| [javacg-static](https://github.com/gousiosg/java-callgraph) | CHA | Lightweight, reads JARs |
179179

180-
The fixture contains raw `.java` source with no build system. Running these
181-
tools requires a `javac` compilation step (tracked in #1307).
180+
The fixture now includes a `Makefile` that compiles all `.java` sources into
181+
`fixture.jar`:
182182

183-
**Current codegraph Java metrics** (`scripts/resolution-benchmark.ts`):
183+
```bash
184+
cd tests/benchmarks/resolution/fixtures/java && make
185+
```
186+
187+
`scripts/compare-javacg.mjs` then runs
188+
[javacg-static](https://github.com/gousiosg/java-callgraph) on the compiled JAR.
189+
javacg-static uses CHA to enumerate all possible call targets at virtual,
190+
interface, and static call sites.
191+
192+
```bash
193+
node scripts/compare-javacg.mjs --jar /path/to/javacg-0.1-SNAPSHOT.jar
194+
# or: JAVACG_JAR=... node scripts/compare-javacg.mjs
195+
```
196+
197+
Download javacg-static from
198+
[github.com/gousiosg/java-callgraph/releases](https://github.com/gousiosg/java-callgraph/releases)
199+
or build with `mvn package -DskipTests`.
200+
201+
**Name mapping:** javacg-static uses `pkg.ClassName:method(JVM-descriptors)` form.
202+
`compare-javacg.mjs` maps this to `ClassName.method` and matches
203+
`ClassName.ClassName` (source constructor) / `ClassName` (target constructor)
204+
against the expected-edges.json convention. Only edges where both class names
205+
appear in the fixture source files are counted; JDK calls (`String`, `HashMap`,
206+
`System.out`, …) are filtered out.
207+
208+
**Codegraph Java metrics** (`scripts/resolution-benchmark.ts`):
184209

185210
| Mode | Codegraph |
186211
|------|:---------:|
@@ -192,6 +217,19 @@ tools requires a `javac` compilation step (tracked in #1307).
192217
| `class-inheritance` (3 edges) | 0/3 (0%) |
193218
| **Total** | **9/17 (53%)** · precision=100% |
194219

220+
**javacg-static comparison** — run `node scripts/compare-javacg.mjs` to populate:
221+
222+
| Tool | Precision | Recall | TP | FP | FN |
223+
|------|:---------:|:------:|---:|---:|---:|
224+
| Codegraph | 100% | 53% | 9 | 0 | 8 |
225+
| javacg-static (CHA) ||||||
226+
227+
javacg-static uses CHA and reads compiled bytecode, so it should resolve
228+
`class-inheritance` (inherited `log()` calls) and `interface-dispatched`
229+
(virtual dispatch via `UserRepository` interface) edges that codegraph
230+
currently misses at the source-level. `static` and `same-file` calls
231+
(`invokestatic` in bytecode) should also be fully captured.
232+
195233
---
196234

197235
## Conclusions
@@ -231,8 +269,9 @@ tools requires a `javac` compilation step (tracked in #1307).
231269
2 `class-inheritance` edges (+7 recall on TS fixture).
232270
2. **Property-assignment type tracking** (#1306) — Track `this.prop = new Foo()`
233271
writes. Recovers 3 JS `receiver-typed` FN.
234-
3. **Java comparison with javacg-static** (#1307) — Add `javac` compilation to
235-
the Java fixture so a bytecode-level tool can validate Java recall claims.
272+
3. **Java recall gaps**`same-file` (0/2) and `static` (0/2) are `invokestatic`
273+
patterns that javacg-static will expose; `class-inheritance` (0/3) requires
274+
tracking inherited method calls from superclass to subclass invocation site.
236275

237276
---
238277

@@ -300,6 +339,10 @@ npx tsx scripts/resolution-benchmark.ts | jq '{javascript, typescript, java}'
300339
npm install @cs-au-dk/jelly @persper/js-callgraph
301340
node scripts/compare-tools.mjs --all
302341

342+
# javacg-static comparison (Java)
343+
cd tests/benchmarks/resolution/fixtures/java && make && cd -
344+
node scripts/compare-javacg.mjs --jar /path/to/javacg-0.1-SNAPSHOT.jar
345+
303346
# Full resolution test suite
304347
npx vitest run tests/benchmarks/resolution/resolution-benchmark.test.ts
305348
```

scripts/compare-javacg.mjs

Lines changed: 297 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,297 @@
1+
#!/usr/bin/env node
2+
/**
3+
* javacg-static vs Codegraph: Java fixture call graph comparison
4+
*
5+
* Runs javacg-static (gousiosg/java-callgraph) on the compiled fixture JAR,
6+
* parses its output, maps class:method names to ClassName.method form, and
7+
* computes precision/recall against expected-edges.json.
8+
*
9+
* javacg-static output format:
10+
* M:pkg.ClassName:method(argDescriptors) (T)pkg.ClassName:method(argDescriptors)
11+
* where T is: C=virtual, S=static, O=special (constructors, super), I=interface, D=dynamic
12+
*
13+
* Name mapping to expected-edges.json convention:
14+
* source <init> → ClassName.ClassName (constructor-as-method)
15+
* target <init> → ClassName (constructor target = class name only)
16+
* other method → ClassName.method
17+
*
18+
* Prerequisites:
19+
* 1. Java runtime (java -jar must work)
20+
* 2. javacg-static JAR — download from:
21+
* https://github.com/gousiosg/java-callgraph/releases
22+
* or build: `cd java-callgraph && mvn package -DskipTests`
23+
* Pass via --jar or set JAVACG_JAR, or place at scripts/lib/javacg-static.jar
24+
* 3. Compiled fixture JAR:
25+
* cd tests/benchmarks/resolution/fixtures/java && make
26+
*
27+
* Usage:
28+
* node scripts/compare-javacg.mjs
29+
* node scripts/compare-javacg.mjs --jar /path/to/javacg-0.1-SNAPSHOT.jar
30+
* node scripts/compare-javacg.mjs --json
31+
*/
32+
33+
import { execFileSync } from 'node:child_process';
34+
import fs from 'node:fs';
35+
import path from 'node:path';
36+
import { fileURLToPath } from 'node:url';
37+
38+
const __dirname = path.dirname(fileURLToPath(import.meta.url));
39+
const ROOT = path.resolve(__dirname, '..');
40+
const FIXTURE_DIR = path.join(ROOT, 'tests/benchmarks/resolution/fixtures/java');
41+
42+
// ── CLI ──────────────────────────────────────────────────────────────────────
43+
44+
const args = process.argv.slice(2);
45+
const jsonFlag = args.includes('--json');
46+
const jarArgIdx = args.indexOf('--jar');
47+
const jarArgNext = jarArgIdx !== -1 ? (args[jarArgIdx + 1] ?? null) : null;
48+
const jarArgPath = jarArgNext && !jarArgNext.startsWith('--') ? jarArgNext : null;
49+
if (jarArgIdx !== -1 && !jarArgPath) {
50+
console.error('Error: --jar requires a path argument');
51+
process.exit(1);
52+
}
53+
54+
// ── Tool discovery ───────────────────────────────────────────────────────────
55+
56+
function findJavacgJar() {
57+
if (jarArgPath) return jarArgPath;
58+
if (process.env.JAVACG_JAR) return process.env.JAVACG_JAR;
59+
// Glob for any jar with "javacg" in the name under scripts/lib/
60+
const libDir = path.join(__dirname, 'lib');
61+
if (fs.existsSync(libDir)) {
62+
const jar = fs.readdirSync(libDir).find((f) => f.includes('javacg') && f.endsWith('.jar'));
63+
if (jar) return path.join(libDir, jar);
64+
}
65+
return null;
66+
}
67+
68+
// ── Name mapping ─────────────────────────────────────────────────────────────
69+
70+
/**
71+
* Scan .java source files to build SimpleClassName → filename.java map.
72+
* Used to resolve file fields in the edge key format "name@file".
73+
*
74+
* Maps the first top-level type per file — inner classes are not indexed.
75+
* Handles common modifiers (public, abstract, final, sealed, non-sealed, strictfp)
76+
* and type keywords (class, interface, enum, record).
77+
*/
78+
function buildClassFileMap(fixtureDir) {
79+
const map = new Map();
80+
const javaFiles = fs.readdirSync(fixtureDir).filter((f) => f.endsWith('.java'));
81+
for (const filename of javaFiles) {
82+
const src = fs.readFileSync(path.join(fixtureDir, filename), 'utf8');
83+
// Match any combination of access/modifier keywords before the type keyword.
84+
// Anchored to line start (^…/m) so Javadoc comments containing the word
85+
// "class" before the actual declaration don't produce a false match.
86+
const m = src.match(
87+
/^(?:(?:public|protected|private|abstract|final|sealed|non-sealed|strictfp)\s+)*(?:class|interface|enum|record)\s+(\w+)/m,
88+
);
89+
if (m) {
90+
map.set(m[1], filename);
91+
} else {
92+
console.warn(`[warn] buildClassFileMap: no type declaration found in ${filename} — edges involving this file will be filtered out`);
93+
}
94+
}
95+
// Validate: every .java file should map to exactly one class name
96+
if (map.size !== javaFiles.length) {
97+
console.warn(
98+
`[warn] buildClassFileMap: ${javaFiles.length} .java files but only ${map.size} class names resolved — precision/recall may be skewed`,
99+
);
100+
}
101+
return map;
102+
}
103+
104+
/**
105+
* Parse "pkg.ClassName:methodName(descriptors)" into { className, methodName }.
106+
* Works with both "." and "/" as package separators (javacg uses ".").
107+
*/
108+
function parseMethodSpec(spec) {
109+
// Strip argument descriptor — everything from "(" onwards
110+
const parenIdx = spec.indexOf('(');
111+
const withoutArgs = parenIdx !== -1 ? spec.slice(0, parenIdx) : spec;
112+
const colonIdx = withoutArgs.indexOf(':');
113+
if (colonIdx === -1) return null;
114+
const classPart = withoutArgs.slice(0, colonIdx);
115+
const methodName = withoutArgs.slice(colonIdx + 1);
116+
// Simple class name: last segment after "." or "/"
117+
const className = classPart.split(/[./]/).at(-1);
118+
if (!className) return null;
119+
return { className, methodName };
120+
}
121+
122+
/** Source side: "<init>" method maps to ClassName.ClassName. */
123+
function toSourceName({ className, methodName }) {
124+
return methodName === '<init>' ? `${className}.${className}` : `${className}.${methodName}`;
125+
}
126+
127+
/** Target side: "<init>" method maps to just ClassName (constructor target). */
128+
function toTargetName({ className, methodName }) {
129+
return methodName === '<init>' ? className : `${className}.${methodName}`;
130+
}
131+
132+
// ── Ground truth ─────────────────────────────────────────────────────────────
133+
134+
function loadGroundTruth(fixtureDir) {
135+
const manifest = JSON.parse(
136+
fs.readFileSync(path.join(fixtureDir, 'expected-edges.json'), 'utf8'),
137+
);
138+
const set = new Set(
139+
manifest.edges.map(
140+
(e) =>
141+
`${e.source.name}@${path.basename(e.source.file)}${e.target.name}@${path.basename(e.target.file)}`,
142+
),
143+
);
144+
return set;
145+
}
146+
147+
// ── Run javacg-static ────────────────────────────────────────────────────────
148+
149+
function runJavacg(javacgJar, fixtureDir) {
150+
const fixtureJar = path.join(fixtureDir, 'fixture.jar');
151+
if (!fs.existsSync(fixtureJar)) {
152+
console.error(`fixture.jar not found at ${fixtureJar}`);
153+
console.error(`Build it with: cd ${fixtureDir} && make`);
154+
process.exit(1);
155+
}
156+
try {
157+
return execFileSync('java', ['-jar', javacgJar, fixtureJar], {
158+
encoding: 'utf8',
159+
stdio: ['ignore', 'pipe', 'pipe'],
160+
});
161+
} catch (err) {
162+
// javacg-static may exit non-zero but still produce useful stdout
163+
if (err.stdout?.trim()) return err.stdout;
164+
console.error(`javacg-static failed: ${err.message}`);
165+
process.exit(1);
166+
}
167+
}
168+
169+
/**
170+
* Parse javacg-static text output into a Set of edge keys.
171+
*
172+
* Line format:
173+
* M:pkg.Class:method(args) (T)pkg.Class:method(args)
174+
*
175+
* Only edges where both class names appear in classFileMap are included —
176+
* this filters out JDK / stdlib calls (HashMap, String, System.out, etc.).
177+
*/
178+
function parseJavacgOutput(output, classFileMap) {
179+
// M: caller (T) callee — the space between caller and (T) may vary
180+
// T values: C=virtual, S=static, O=special (constructors/super), I=interface, D=dynamic (invokedynamic)
181+
const lineRe = /^M:(\S+)\s+\(([CSOID])\)(\S+)$/;
182+
const edges = new Set();
183+
184+
for (const rawLine of output.split('\n')) {
185+
const line = rawLine.trim();
186+
if (!line.startsWith('M:')) continue;
187+
188+
const m = line.match(lineRe);
189+
if (!m) continue;
190+
191+
const [, sourceSpec, , targetSpec] = m;
192+
193+
const sourceParsed = parseMethodSpec(sourceSpec);
194+
const targetParsed = parseMethodSpec(targetSpec);
195+
if (!sourceParsed || !targetParsed) continue;
196+
197+
const sourceFile = classFileMap.get(sourceParsed.className);
198+
const targetFile = classFileMap.get(targetParsed.className);
199+
// Skip edges to/from classes outside the fixture (JDK, etc.)
200+
if (!sourceFile || !targetFile) continue;
201+
202+
const sourceName = toSourceName(sourceParsed);
203+
const targetName = toTargetName(targetParsed);
204+
205+
const key = `${sourceName}@${sourceFile}${targetName}@${targetFile}`;
206+
// Skip self-edges (e.g. recursive calls not in expected-edges)
207+
if (sourceName === targetName && sourceFile === targetFile) continue;
208+
edges.add(key);
209+
}
210+
return edges;
211+
}
212+
213+
// ── Metrics ──────────────────────────────────────────────────────────────────
214+
215+
function computeMetrics(predicted, groundTruth) {
216+
let tp = 0;
217+
const fp = [];
218+
const fn = [];
219+
for (const edge of predicted) (groundTruth.has(edge) ? tp++ : fp.push(edge));
220+
for (const edge of groundTruth) if (!predicted.has(edge)) fn.push(edge);
221+
return {
222+
precision: predicted.size === 0 ? 0 : tp / predicted.size,
223+
recall: groundTruth.size === 0 ? 0 : tp / groundTruth.size,
224+
tp,
225+
fp: fp.length,
226+
fn: fn.length,
227+
totalPredicted: predicted.size,
228+
totalExpected: groundTruth.size,
229+
fpEdges: fp,
230+
fnEdges: fn,
231+
};
232+
}
233+
234+
// ── Main ─────────────────────────────────────────────────────────────────────
235+
236+
const javacgJar = findJavacgJar();
237+
if (!javacgJar) {
238+
console.error('javacg-static JAR not found.');
239+
console.error('Download from: https://github.com/gousiosg/java-callgraph/releases');
240+
console.error('Then use one of:');
241+
console.error(' node scripts/compare-javacg.mjs --jar /path/to/javacg-0.1-SNAPSHOT.jar');
242+
console.error(' JAVACG_JAR=/path/to/javacg-0.1-SNAPSHOT.jar node scripts/compare-javacg.mjs');
243+
console.error(' cp /path/to/javacg-0.1-SNAPSHOT.jar scripts/lib/javacg-static.jar');
244+
process.exit(1);
245+
}
246+
247+
const classFileMap = buildClassFileMap(FIXTURE_DIR);
248+
const groundTruth = loadGroundTruth(FIXTURE_DIR);
249+
250+
console.error(`\n── JAVA ──────────────────────────────────────────────────`);
251+
console.error(` Ground truth: ${groundTruth.size} edges`);
252+
console.error(` Running javacg-static on fixture.jar...`);
253+
254+
const rawOutput = runJavacg(javacgJar, FIXTURE_DIR);
255+
const predictedEdges = parseJavacgOutput(rawOutput, classFileMap);
256+
257+
console.error(` javacg-static: ${predictedEdges.size} named benchmark edges`);
258+
259+
const metrics = computeMetrics(predictedEdges, groundTruth);
260+
261+
console.error(
262+
` precision=${metrics.precision.toFixed(2)} recall=${metrics.recall.toFixed(2)} ` +
263+
`TP=${metrics.tp} FP=${metrics.fp} FN=${metrics.fn}`,
264+
);
265+
266+
if (metrics.fpEdges.length) {
267+
console.error(` FP (edges not in expected-edges.json):`);
268+
for (const e of metrics.fpEdges) console.error(` - ${e}`);
269+
}
270+
if (metrics.fnEdges.length) {
271+
console.error(` FN (expected edges missed):`);
272+
for (const e of metrics.fnEdges) console.error(` - ${e}`);
273+
}
274+
275+
if (jsonFlag) {
276+
console.log(
277+
JSON.stringify(
278+
{
279+
java: {
280+
groundTruth: groundTruth.size,
281+
javacgEdges: predictedEdges.size,
282+
metrics,
283+
},
284+
},
285+
null,
286+
2,
287+
),
288+
);
289+
} else {
290+
console.log('\n## javacg-static vs expected-edges.json Ground Truth\n');
291+
console.log('| Language | Tool | Precision | Recall | TP | FP | FN |');
292+
console.log('|----------|------|:---------:|:------:|---:|---:|---:|');
293+
console.log(
294+
`| Java | javacg-static (CHA) | ${(metrics.precision * 100).toFixed(0)}% | ` +
295+
`${(metrics.recall * 100).toFixed(0)}% | ${metrics.tp} | ${metrics.fp} | ${metrics.fn} |`,
296+
);
297+
}
Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
fixture.jar
2+
benchmark/
Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
SRCS = $(wildcard *.java)
2+
JAR = fixture.jar
3+
4+
.PHONY: all clean
5+
6+
all: $(JAR)
7+
8+
$(JAR): $(SRCS)
9+
rm -rf benchmark/
10+
javac -d . $(SRCS)
11+
jar cf $(JAR) benchmark/
12+
13+
clean:
14+
rm -f $(JAR)
15+
rm -rf benchmark/

0 commit comments

Comments
 (0)