Skip to content

fix: sanitize LLM-derived text before logging, JSON serialization, and database writes#10

Open
GrigoryEvko wants to merge 7 commits into
FusionBrainLab:mainfrom
GrigoryEvko:fix/llm-output-sanitization
Open

fix: sanitize LLM-derived text before logging, JSON serialization, and database writes#10
GrigoryEvko wants to merge 7 commits into
FusionBrainLab:mainfrom
GrigoryEvko:fix/llm-output-sanitization

Conversation

@GrigoryEvko
Copy link
Copy Markdown

@GrigoryEvko GrigoryEvko commented May 15, 2026

LLM responses and compiler stderr from heterogeneous toolchains (Python tracebacks, Triton MLIR diagnostics, nvcc / ptxas / CUTLASS template explosions, Mojo error formatter output, Pallas / JAX jaxpr traces, CuTe layout errors) flow through gigaevo into loguru sinks, orjson serialization for Redis Streams and the program archive, asyncpg TEXT columns, langfuse trace metadata, file paths, and back into subsequent LLM prompts as part of multi-agent loops. The text was previously passed through verbatim. A single lone UTF-16 surrogate anywhere in a stage error raises UnicodeEncodeError from inside orjson, aborts the Program write to Redis, and stalls the evolution loop. A literal NUL byte in a traceback is rejected by asyncpg with A string literal cannot contain NUL (0x00) characters and aborts the tracker write. ANSI escape sequences from nvcc and clang colorization survive into loguru file sinks and corrupt log readers; CR survives and lets a multi-line LLM justification forge log entries that downstream parsers cannot distinguish from authentic records; BIDI overrides survive and hide content from operator log review.

A minimal reproducer for the orjson crash path, runnable from the repo root before this change:

import openai, httpx, orjson
req = httpx.Request("POST", "https://api.openai.com/v1/chat/completions")
resp = httpx.Response(429, request=req)
err = openai.RateLimitError("rate limited \ud83d", response=resp, body=None)
data = {"error": str(err)}
orjson.dumps(data)
# orjson.JSONEncodeError: surrogates not allowed

The change introduces gigaevo/utils/text_sanitize.py with five pure str -> str functions: sanitize_for_log strips ANSI / BIDI overrides / lone surrogates and escapes C0 and C1 control characters except TAB and LF; sanitize_for_json replaces lone surrogates; sanitize_for_dbtext replaces NUL plus lone surrogates; clean_identifier keeps the conservative charset [A-Za-z0-9._:/+@-]; deep_sanitize_for_json walks JSON-shaped containers and applies sanitize_for_json to every string leaf. Multi-byte Unicode that legitimate LLM-generated cross-language output carries (Greek identifiers in Mojo, U+2192 / U+21D2 arrows in Mojo and Pallas error formatters, CJK comments, math symbols, emoji, Unicode box-drawing in clang carets, CUTLASS template syntax like Layout<Shape<_32,_128>,Stride<_128,_1>>) passes through unchanged.

Integration is applied at the boundaries where the unsanitized text would crash or corrupt. gigaevo/utils/json.py dumps wraps obj in deep_sanitize_for_json before orjson and the stdlib fallback (covers every _dumps caller in Redis storage transparently). StageError in gigaevo/programs/core_types.py grows pydantic field_validators on type, message, and traceback so every construction path (from_exception, direct construction, JSON revalidation) yields sanitized values, which propagates through error.pretty(), program.format_errors() re-injection into LLM prompts, and downstream log lines. MigrantEnvelope.to_stream_fields in gigaevo/evolution/bus/transport.py runs program_data through deep_sanitize_for_json before json.dumps. MultiModelRouter in gigaevo/llm/models.py validates model_name once at construction through clean_identifier, redacts userinfo from base_url via a local _redact_url helper before any log call, and sanitizes server-returned model ids from /models probes. MutationStructuredOutput and MutationChange schemas in gigaevo/llm/agents/mutation.py grow field_validators on archetype, justification, code, description, explanation, and insights_used. TokenTracker.track wraps TokenUsage.from_response in try / except so a hostile token_usage payload no longer escapes ainvoke and kills the LangGraph node. clean_identifier is applied to ParamSpec.name in gigaevo/programs/stages/optimization/optuna/stage.py before the value flows into trial.suggest_* and embeds in Optuna's storage key. sanitize_for_log wraps the exception interpolations in gigaevo/runner/dag_runner.py, gigaevo/programs/stages/python_executors/execution.py, gigaevo/programs/stages/optimization/utils.py, gigaevo/programs/stages/validation.py, gigaevo/programs/dag/dag.py, gigaevo/database/redis_program_storage.py, gigaevo/database/state_manager.py, gigaevo/evolution/mutation/mutation_operator.py, gigaevo/prompts/coevolution/stages.py, gigaevo/prompts/coevolution/stats.py, gigaevo/prompts/fetcher.py, gigaevo/llm/agents/memory_selector.py, gigaevo/llm/bandit.py. gigaevo/utils/trackers/backends/redis.py replaces an ad-hoc tag normalization with clean_identifier and applies sanitize_for_dbtext on the history value plus deep_sanitize_for_json on the history record before json.dumps.

Tests add the sanitizer unit suite (sanitize_for_log ANSI / C0 / C1 / BIDI / surrogate coverage; sanitize_for_json minimum-viable surrogate handling; sanitize_for_dbtext NUL plus surrogate; clean_identifier charset and max_len; deep_sanitize_for_json recursive walk) plus three adversarial suites generated against the same module: a Unicode suite covering confusables, normalization invariance, zero-width characters, weak versus strong BIDI marks, variation selectors, tag characters, line and paragraph separators, BOM, soft hyphen, Zalgo combining stacks, CJK script families, RTL scripts, emoji ZWJ sequences, Fitzpatrick skin-tone modifiers, regional-indicator flags, mathematical alphanumerics, halfwidth and fullwidth forms, and non-Latin digit forms; a regex-bypass suite covering malformed CSI, intermediate bytes, private parameters, OSC family including security-relevant OSC 52, DCS / SOS / PM / APC, direct C1 introducers, bare ESC plus Fp / Fs gaps, adjacent surrogates, ANSI inside emoji ZWJ sequences, perf-bound DoS tests; a downstream-consumer suite that pipes sanitized output through json.dumps both encoding modes, pydantic.BaseModel.model_dump_json, pydantic.TypeAdapter.dump_python and dump_json, str.encode("utf-8"), loguru file sinks, subprocess argv, fakeredis SET / GET, sqlite3 TEXT round-trips, csv.writer round-trips, and a realistic openai 2.x BadRequestError carrying an ANSI-colorized nvcc error plus an embedded NUL plus a Greek-letter Mojo identifier. Integration tests under tests/llm/test_sanitize_wiring.py, tests/utils/test_text_sanitize_wiring.py, tests/stages/test_sanitize_integration.py, and tests/dag/test_sanitize_integration.py exercise each modified production module with a hostile fixture combining ANSI, NUL, CR, BEL, lone surrogate, and BIDI RLO, and assert the captured loguru output contains no raw hostile bytes, encodes cleanly as UTF-8, and round-trips through json.dumps. Two defects were discovered during the sanitizer audit and fixed: the lone-surrogate regex previously used a lookahead / lookbehind that mistakenly treated adjacent independent surrogates as a valid pair (chr(0xD800) + chr(0xDC00) survived and broke UTF-8 encoding downstream); clean_identifier with a negative max_len silently dropped a trailing character via the Python slice quirk and now raises ValueError. Total new test count is approximately 760, and the full target regression suite (tests/llm, tests/utils, tests/dag, tests/test_program.py, tests/stages, tests/database, tests/evolution, tests/prompts, tests/trackers, tests/infra) passes after the changes.

…d database writes

Introduce gigaevo/utils/text_sanitize.py with five pure str-to-str
functions and wire them at the boundaries where unsanitized
LLM-derived text would crash or corrupt downstream consumers. StageError
gets pydantic field_validators; gigaevo/utils/json.py dumps wraps obj
through deep_sanitize_for_json before serializing; MigrantEnvelope and
the tracker Redis backend gain serialization belts; MultiModelRouter
validates model_name via clean_identifier and redacts userinfo from
base_url before logging; mutation, memory_selector, token_tracking,
bandit, optuna stage, dag_runner, and prompt subsystems route their
LLM-derived interpolations through sanitize_for_log.

Two defects discovered during the sanitizer audit are fixed: the
lone-surrogate regex previously treated adjacent independent surrogates
as a valid pair (chr(0xD800) + chr(0xDC00) survived and broke UTF-8
encoding); clean_identifier with negative max_len silently dropped a
trailing character via the slice quirk and now raises ValueError.

Adds approximately 760 tests across the sanitizer unit suite plus
adversarial Unicode, regex-bypass, and downstream-consumer suites, and
integration tests under tests/llm, tests/utils, tests/dag, tests/stages
that exercise each modified module with a hostile fixture combining
ANSI, NUL, CR, BEL, lone surrogate, and BIDI override.
test_init_log_with_hostile_model_name previously ran the real
_verify_models HTTP probe against a fake base_url and spent around
seven seconds waiting for the urllib timeout. The test exists to
verify the init INFO banner sanitizes model_name and redacts
base_url userinfo, not to exercise the server probe (the other two
tests in the same class already do that with monkeypatched
urlopen). Patching _verify_models to a no-op brings the test from
7.4 seconds down to under ten milliseconds.
…anitization

_apply_modifications cleaned ParamSpec.name through clean_identifier
but inserted the parameterized_snippet unchanged, so a snippet that
originally referenced _optuna_params['old hostile name'] failed with
KeyError once the trial dict only carried the cleaned name. Build a
name_map alongside the cleaning loop, dedupe collisions so two
distinct hostile names that clean to the same identifier do not
collapse into one parameter, and rewrite every string-literal
_optuna_params[...] reference inside every snippet before splicing.
Tests now positively exec the rewritten code against a synthetic
_optuna_params dict instead of swallowing exceptions.
…aths

flush() stored latest values under clean_identifier(tag), but
get_latest, get_history, and clear_series read with the raw caller
tag. A write_scalar('loss train', ...) was stored as 'losstrain' and
get_latest('loss train') returned an empty dict. Extract a single
_field_tag() helper applied symmetrically on every read and write
side; fall back to metric_<sha256[:12]> when sanitization yields an
empty tag so distinct hostile inputs do not collide. New
round-trip tests cover both clean and hostile tags through
write_scalar -> flush -> get_latest / list_metrics / get_history.
prompt_text_to_id called blob.encode() on the raw return value of
the prompt entrypoint, raising UnicodeEncodeError when the
LLM-generated prompt source synthesized a lone UTF-16 surrogate via
an escape literal. The error aborted the prompt fitness pipeline.
Apply sanitize_for_log inside prompt_text_to_id and at the two
entrypoint call sites so the stored prompt text and the hashed
prompt text always match. Clean prompts keep their historical
sha256[:16] id unchanged (test_clean_system_only_hash_matches_previous_sha256
proves no migration breakage). Empty-after-sanitization input
becomes an explicit error in the prompt execution stage and a
skip with warning in the archive fetcher rather than hashing an
empty blob.
parse_response computed safe_msg through sanitize_for_log for the
loguru line and for state['error'] but the value placed in
parsed_output['error'] still came from str(e). Downstream consumers
that read parsed_output directly (run-state dumps, langfuse traces,
re-injection into LLM prompts) now also see the sanitized form.
The structured-output schemas receive LLM responses verbatim. ANSI
escapes, NUL bytes, lone UTF-16 surrogates, and BIDI overrides flow
through bare str fields into reports, JSON dumps, Postgres TEXT
columns, and re-injection back into LLM prompts. Match the pattern
already applied to MutationStructuredOutput: pydantic
field_validators on ProgramInsight.{type, insight, tag, severity}
and TransitionInsight.{strategy, description} pipe each value
through sanitize_for_log at the schema layer. ProgramScore has no
str fields, no change. Tests cover the clean-input identity path,
ANSI stripping, lone-surrogate replacement, CR removal, and
preservation of legitimate Unicode (arrows, math symbols, template
syntax).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant