feat: Add LLMSimulationTraceCompressor for token-efficient AI summaries by dyrpsf · Pull Request #112 · draeger-lab/SBSCL

dyrpsf · 2026-04-24T08:54:27Z

Overview

This PR introduces the LLMSimulationTraceCompressor to the org.simulator.math package. This is a foundational utility for the upcoming gsoc-sysbio-llm-tools architecture, designed to specifically solve the LLM token-limit bottleneck when handling massive simulation outputs.

Changes Made

Data Compression: Built a utility to intercept MultiTable time-series data and algorithmically extract critical biological waypoints (initial states, peak concentrations, peak times, and final steady states).
Token Optimization: Formats the extracted data into a highly dense, LLM-readable summary string, preventing context-window overflow and reducing AI hallucinations.
Testing: Added LLMSimulationTraceCompressorTest.java utilizing Mockito to mock MultiTable behaviors and verify the statistical extraction logic.

Validation

Compiles cleanly within the sbscl module.
Validated locally alongside the full SBSCL test suite.

Schmoho · 2026-05-14T13:13:51Z

I would likely not frame this specifically around LLMs but more generally as a summary string mechanism.

As for the test, I am uncertain whether there is enough covered with that. At least there is a test though, thank you!

dyrpsf · 2026-05-14T16:36:25Z

I would likely not frame this specifically around LLMs but more generally as a summary string mechanism.

As for the test, I am uncertain whether there is enough covered with that. At least there is a test though, thank you!

You make a great point about keeping the core library naming generic. I have renamed it to SimulationTraceCompressor so it isn't strictly tied to LLM use cases. I also added some extra assertions to the test suite to cover edge cases like empty traces!

draeger

Minor note; I like to advice using 1d, 1f, 1L, etc. to indicate the type of numeric value when working with literals.

draeger · 2026-05-29T18:49:11Z

+        Mockito.when(mockTable.getColumnCount()).thenReturn(2);
+
+        // Mock time points
+        Mockito.when(mockTable.getTimePoint(0)).thenReturn(0.0);


I think it is more precise to write 1d or 1f, etc. rather than 1.0. The latter will certainly be interpreted as a double but the first two versions give us developers freedom to specify how much memory we want to spend.

feat: Add LLMSimulationTraceCompressor for token-efficient AI summaries

b9ad0e1

draeger requested review from NeumannArthur, Schmoho and draeger April 24, 2026 13:38

draeger added the enhancement label Apr 24, 2026

draeger assigned dyrpsf Apr 24, 2026

draeger approved these changes Apr 24, 2026

View reviewed changes

refactor: rename to SimulationTraceCompressor and expand test coverage

daa2729

draeger self-requested a review May 29, 2026 15:36

draeger requested changes May 29, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Add LLMSimulationTraceCompressor for token-efficient AI summaries#112

feat: Add LLMSimulationTraceCompressor for token-efficient AI summaries#112
dyrpsf wants to merge 2 commits into
draeger-lab:masterfrom
dyrpsf:feature-llm-trace-compressor

dyrpsf commented Apr 24, 2026

Uh oh!

Schmoho commented May 14, 2026

Uh oh!

dyrpsf commented May 14, 2026

Uh oh!

draeger left a comment

Uh oh!

draeger May 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

dyrpsf commented Apr 24, 2026

Overview

Changes Made

Validation

Uh oh!

Schmoho commented May 14, 2026

Uh oh!

dyrpsf commented May 14, 2026

Uh oh!

draeger left a comment

Choose a reason for hiding this comment

Uh oh!

draeger May 29, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants