Crash: toolbench analyze OOM / crash on large input files


**Labels:** `bug`, `high priority`, `cli`, `memory`

## Summary

Running `toolbench analyze` on large input files (≥ \~200MB) causes the process to crash with either an out-of-memory error or unhandled exception. The crash appears to occur during parsing/aggregation in `lib/analyze.js` (or `toolbench/analyze.py` depending on implementation). Small files work fine.

## Expected

`toolbench analyze <file>` should stream or chunk the input and complete successfully (or fail gracefully with a helpful error) for large files.

## Actual

Process crashes with either:

* Node: `FATAL ERROR: Reached heap limit Allocation failed - JavaScript heap out of memory`, or
* Python: `MemoryError` or process killed by OS (OOM killer), or
* Unhandled exception and non-zero exit code with no helpful message.

## Environment

* ToolBench commit / tag: `HEAD` (please replace with exact commit hash)
* OS: Ubuntu 22.04 LTS (also reproduced on macOS 12)
* Node.js: v18.16.0 (if applicable)
* Python: 3.11.4 (if applicable)
* RAM: 8GB
* Reproduction on both machine-local and CI (GitHub Actions) observed

---

## Reproduction steps (minimal)

1. Create a large test file (200MB+). Example command to generate a test file:

```bash
# Linux / macOS: create a 250MB test file
base64 /dev/urandom | head -c 250000000 > ./test-large.ndjson
```

2. Run the analysis:

```bash
# CLI invocation
toolbench analyze ./test-large.ndjson --mode summary
```

3. Observe crash:

```text
# Node.js example crash
$ toolbench analyze ./test-large.ndjson --mode summary
FATAL ERROR: Reached heap limit Allocation failed - JavaScript heap out of memory
Aborted (core dumped)
```

Or for Python:

```text
$ toolbench analyze ./test-large.ndjson --mode summary
Traceback (most recent call last):
  File "toolbench/cli.py", line 42, in <module>
    main()
  File "toolbench/analyze.py", line 210, in analyze
    results = aggregator.aggregate(all_items)
MemoryError
```

---

## Quick root-cause hypothesis

The current implementation accumulates the entire parsed input into memory (e.g. `all_items = []`, or a full `json.load()`/`file.read()`), then runs in-memory aggregation. For very large files this triggers huge memory usage and crashes. The CLI should either:

* Stream-process input (line-by-line or chunked), keeping only aggregated state, or
* Use a bounded buffer / external temporary storage (SQLite or temporary file) for intermediate results, or
* Provide an option to use memory-limited mode.

---

## Suggested fix (Node.js example)

Change the implementation to stream the file instead of reading whole file into memory.

**Before** (problematic pattern):

```js
// lib/analyze.js (hypothetical)
const fs = require('fs');

function analyze(path) {
  const raw = fs.readFileSync(path, 'utf8');  // reads entire file
  const items = raw.split('\n').map(JSON.parse);
  // heavy in-memory aggregation
  return aggregate(items);
}
```

**After** (streaming approach using readline):

```js
// lib/analyze.js
const fs = require('fs');
const readline = require('readline');

async function analyze(path) {
  const stats = createAggregator(); // small stateful object
  const rl = readline.createInterface({
    input: fs.createReadStream(path, { encoding: 'utf8' }),
    crlfDelay: Infinity
  });

  for await (const line of rl) {
    if (!line.trim()) continue;
    let item;
    try {
      item = JSON.parse(line);
    } catch (err) {
      // handle / log parse error per-line
      continue;
    }
    stats.add(item); // aggregator keeps only necessary summary/state
  }

  return stats.finalize();
}

module.exports = { analyze };
```

This avoids loading the entire file into memory.

---

## Suggested fix (Python example)

Use an iterator and avoid reading the entire file with `json.load()`.

**Before** (problematic):

```python
# toolbench/analyze.py
with open(path, 'r', encoding='utf-8') as fh:
    data = json.load(fh)   # loads whole file -> OOM
aggregator = Aggregator()
aggregator.aggregate(data)
```

**After** (streaming, NDJSON example):

```python
# toolbench/analyze.py
def analyze(path):
    aggregator = Aggregator()
    with open(path, 'r', encoding='utf-8') as fh:
        for line in fh:
            line = line.strip()
            if not line:
                continue
            try:
                item = json.loads(line)
            except json.JSONDecodeError:
                # optionally log and continue
                continue
            aggregator.add(item)
    return aggregator.result()
```

If input is not NDJSON, consider `ijson` for streaming JSON arrays:

```python
# example with ijson for JSON arrays
import ijson
with open(path, 'rb') as fh:
    parser = ijson.items(fh, 'item')
    for item in parser:
        aggregator.add(item)
```

Add `ijson` to optional dependencies if needed.

---

## Tests to add (unit / integration)

**Node: jest integration test**
Create `tests/large-file.integration.test.js`:

```js
const fs = require('fs');
const { spawnSync } = require('child_process');
const tmp = require('tmp');

// generate small-ish file but conceptually large for CI
test('analyze handles streaming input without OOM', () => {
  const tmpFile = tmp.fileSync({ postfix: '.ndjson' });
  const lines = [];
  for (let i = 0; i < 10000; i++) {
    lines.push(JSON.stringify({ id: i, value: Math.random() }));
  }
  fs.writeFileSync(tmpFile.name, lines.join('\n'), 'utf8');

  const result = spawnSync('node', ['bin/toolbench', 'analyze', tmpFile.name], {
    encoding: 'utf8',
    maxBuffer: 1024 * 1024 * 10
  });

  expect(result.status).toBe(0);
  expect(result.stdout).toMatch(/summary/); // adapt to actual CLI output
});
```

**Python pytest integration**
`tests/test_analyze_large.py`:

```python
import json
import tempfile
import subprocess
import sys

def test_analyze_handles_large_file(tmp_path):
    p = tmp_path / "test.ndjson"
    with p.open("w", encoding="utf-8") as fh:
        for i in range(20000):
            fh.write(json.dumps({"id": i, "v": i}) + "\n")

    proc = subprocess.run([sys.executable, "-m", "toolbench", "analyze", str(p)],
                          capture_output=True, text=True)
    assert proc.returncode == 0
    assert "summary" in proc.stdout.lower()  # adapt to actual output
```

---

## Suggested PR checklist / reviewer notes

* Replace any `readFileSync`/`read()` + `JSON.parse()` of entire file with streaming approach.
* Add unit/integration test above to CI matrix.
* Add a CLI flag `--stream` or automatically detect `stdin` and stream.
* Update README to document memory-safe mode and supported input formats (NDJSON / JSON array).
* If using `ijson` or another third-party streaming parser, add it to dependencies and provide fallback.

---

## Logs / attachments

(Attach any verbose logs or profiler output, e.g., `node --trace_gc` or `python -X tracemalloc`, to help root-cause.)

Example: `node --max-old-space-size=4096 bin/toolbench analyze test-large.ndjson`

* If increasing the node heap avoids crash, it's more evidence of memory use pattern.

---

## Temporary workarounds

* Split input file into smaller chunks and run `toolbench analyze` on each, then merge results externally.
* Run under larger-memory machine / increase Node heap with `--max-old-space-size=8192` (not a real fix).

---

## Example minimal patch idea (pseudo)

1. Create `lib/streaming-aggregator.js` with a small stateful aggregator API (`add(item)`, `finalize()`).
2. Modify CLI entrypoint to choose streaming path for files > 10MB or when `--stream` is passed.
3. Add tests and docs.

---
 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Crash: toolbench analyze OOM / crash on large input files #4

Summary

Expected

Actual

Environment

Reproduction steps (minimal)

Quick root-cause hypothesis

Suggested fix (Node.js example)

Suggested fix (Python example)

Tests to add (unit / integration)

Suggested PR checklist / reviewer notes

Logs / attachments

Temporary workarounds

Example minimal patch idea (pseudo)

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Crash: toolbench analyze OOM / crash on large input files #4

Description

Summary

Expected

Actual

Environment

Reproduction steps (minimal)

Quick root-cause hypothesis

Suggested fix (Node.js example)

Suggested fix (Python example)

Tests to add (unit / integration)

Suggested PR checklist / reviewer notes

Logs / attachments

Temporary workarounds

Example minimal patch idea (pseudo)

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions