Skip to content

feat: Add LLMSimulationTraceCompressor for token-efficient AI summaries#112

Open
dyrpsf wants to merge 2 commits into
draeger-lab:masterfrom
dyrpsf:feature-llm-trace-compressor
Open

feat: Add LLMSimulationTraceCompressor for token-efficient AI summaries#112
dyrpsf wants to merge 2 commits into
draeger-lab:masterfrom
dyrpsf:feature-llm-trace-compressor

Conversation

@dyrpsf
Copy link
Copy Markdown
Contributor

@dyrpsf dyrpsf commented Apr 24, 2026

Overview

This PR introduces the LLMSimulationTraceCompressor to the org.simulator.math package. This is a foundational utility for the upcoming gsoc-sysbio-llm-tools architecture, designed to specifically solve the LLM token-limit bottleneck when handling massive simulation outputs.

Changes Made

  • Data Compression: Built a utility to intercept MultiTable time-series data and algorithmically extract critical biological waypoints (initial states, peak concentrations, peak times, and final steady states).
  • Token Optimization: Formats the extracted data into a highly dense, LLM-readable summary string, preventing context-window overflow and reducing AI hallucinations.
  • Testing: Added LLMSimulationTraceCompressorTest.java utilizing Mockito to mock MultiTable behaviors and verify the statistical extraction logic.

Validation

  • Compiles cleanly within the sbscl module.
  • Validated locally alongside the full SBSCL test suite.

@Schmoho
Copy link
Copy Markdown

Schmoho commented May 14, 2026

I would likely not frame this specifically around LLMs but more generally as a summary string mechanism.

As for the test, I am uncertain whether there is enough covered with that. At least there is a test though, thank you!

@dyrpsf
Copy link
Copy Markdown
Contributor Author

dyrpsf commented May 14, 2026

I would likely not frame this specifically around LLMs but more generally as a summary string mechanism.

As for the test, I am uncertain whether there is enough covered with that. At least there is a test though, thank you!

You make a great point about keeping the core library naming generic. I have renamed it to SimulationTraceCompressor so it isn't strictly tied to LLM use cases. I also added some extra assertions to the test suite to cover edge cases like empty traces!

@draeger draeger self-requested a review May 29, 2026 15:36
Copy link
Copy Markdown
Member

@draeger draeger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor note; I like to advice using 1d, 1f, 1L, etc. to indicate the type of numeric value when working with literals.

Mockito.when(mockTable.getColumnCount()).thenReturn(2);

// Mock time points
Mockito.when(mockTable.getTimePoint(0)).thenReturn(0.0);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is more precise to write 1d or 1f, etc. rather than 1.0. The latter will certainly be interpreted as a double but the first two versions give us developers freedom to specify how much memory we want to spend.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants