fix: sanitize LLM-derived text before logging, JSON serialization, and database writes by GrigoryEvko · Pull Request #10 · FusionBrainLab/gigaevo-core

GrigoryEvko · 2026-05-15T05:53:13Z

LLM responses and compiler stderr from heterogeneous toolchains (Python tracebacks, Triton MLIR diagnostics, nvcc / ptxas / CUTLASS template explosions, Mojo error formatter output, Pallas / JAX jaxpr traces, CuTe layout errors) flow through gigaevo into loguru sinks, orjson serialization for Redis Streams and the program archive, asyncpg TEXT columns, langfuse trace metadata, file paths, and back into subsequent LLM prompts as part of multi-agent loops. The text was previously passed through verbatim. A single lone UTF-16 surrogate anywhere in a stage error raises UnicodeEncodeError from inside orjson, aborts the Program write to Redis, and stalls the evolution loop. A literal NUL byte in a traceback is rejected by asyncpg with A string literal cannot contain NUL (0x00) characters and aborts the tracker write. ANSI escape sequences from nvcc and clang colorization survive into loguru file sinks and corrupt log readers; CR survives and lets a multi-line LLM justification forge log entries that downstream parsers cannot distinguish from authentic records; BIDI overrides survive and hide content from operator log review.

A minimal reproducer for the orjson crash path, runnable from the repo root before this change:

import openai, httpx, orjson
req = httpx.Request("POST", "https://api.openai.com/v1/chat/completions")
resp = httpx.Response(429, request=req)
err = openai.RateLimitError("rate limited \ud83d", response=resp, body=None)
data = {"error": str(err)}
orjson.dumps(data)
# orjson.JSONEncodeError: surrogates not allowed

The change introduces gigaevo/utils/text_sanitize.py with five pure str -> str functions: sanitize_for_log strips ANSI / BIDI overrides / lone surrogates and escapes C0 and C1 control characters except TAB and LF; sanitize_for_json replaces lone surrogates; sanitize_for_dbtext replaces NUL plus lone surrogates; clean_identifier keeps the conservative charset [A-Za-z0-9._:/+@-]; deep_sanitize_for_json walks JSON-shaped containers and applies sanitize_for_json to every string leaf. Multi-byte Unicode that legitimate LLM-generated cross-language output carries (Greek identifiers in Mojo, U+2192 / U+21D2 arrows in Mojo and Pallas error formatters, CJK comments, math symbols, emoji, Unicode box-drawing in clang carets, CUTLASS template syntax like Layout<Shape<_32,_128>,Stride<_128,_1>>) passes through unchanged.

Integration is applied at the boundaries where the unsanitized text would crash or corrupt. gigaevo/utils/json.py dumps wraps obj in deep_sanitize_for_json before orjson and the stdlib fallback (covers every _dumps caller in Redis storage transparently). StageError in gigaevo/programs/core_types.py grows pydantic field_validators on type, message, and traceback so every construction path (from_exception, direct construction, JSON revalidation) yields sanitized values, which propagates through error.pretty(), program.format_errors() re-injection into LLM prompts, and downstream log lines. MigrantEnvelope.to_stream_fields in gigaevo/evolution/bus/transport.py runs program_data through deep_sanitize_for_json before json.dumps. MultiModelRouter in gigaevo/llm/models.py validates model_name once at construction through clean_identifier, redacts userinfo from base_url via a local _redact_url helper before any log call, and sanitizes server-returned model ids from /models probes. MutationStructuredOutput and MutationChange schemas in gigaevo/llm/agents/mutation.py grow field_validators on archetype, justification, code, description, explanation, and insights_used. TokenTracker.track wraps TokenUsage.from_response in try / except so a hostile token_usage payload no longer escapes ainvoke and kills the LangGraph node. clean_identifier is applied to ParamSpec.name in gigaevo/programs/stages/optimization/optuna/stage.py before the value flows into trial.suggest_* and embeds in Optuna's storage key. sanitize_for_log wraps the exception interpolations in gigaevo/runner/dag_runner.py, gigaevo/programs/stages/python_executors/execution.py, gigaevo/programs/stages/optimization/utils.py, gigaevo/programs/stages/validation.py, gigaevo/programs/dag/dag.py, gigaevo/database/redis_program_storage.py, gigaevo/database/state_manager.py, gigaevo/evolution/mutation/mutation_operator.py, gigaevo/prompts/coevolution/stages.py, gigaevo/prompts/coevolution/stats.py, gigaevo/prompts/fetcher.py, gigaevo/llm/agents/memory_selector.py, gigaevo/llm/bandit.py. gigaevo/utils/trackers/backends/redis.py replaces an ad-hoc tag normalization with clean_identifier and applies sanitize_for_dbtext on the history value plus deep_sanitize_for_json on the history record before json.dumps.

Tests add the sanitizer unit suite (sanitize_for_log ANSI / C0 / C1 / BIDI / surrogate coverage; sanitize_for_json minimum-viable surrogate handling; sanitize_for_dbtext NUL plus surrogate; clean_identifier charset and max_len; deep_sanitize_for_json recursive walk) plus three adversarial suites generated against the same module: a Unicode suite covering confusables, normalization invariance, zero-width characters, weak versus strong BIDI marks, variation selectors, tag characters, line and paragraph separators, BOM, soft hyphen, Zalgo combining stacks, CJK script families, RTL scripts, emoji ZWJ sequences, Fitzpatrick skin-tone modifiers, regional-indicator flags, mathematical alphanumerics, halfwidth and fullwidth forms, and non-Latin digit forms; a regex-bypass suite covering malformed CSI, intermediate bytes, private parameters, OSC family including security-relevant OSC 52, DCS / SOS / PM / APC, direct C1 introducers, bare ESC plus Fp / Fs gaps, adjacent surrogates, ANSI inside emoji ZWJ sequences, perf-bound DoS tests; a downstream-consumer suite that pipes sanitized output through json.dumps both encoding modes, pydantic.BaseModel.model_dump_json, pydantic.TypeAdapter.dump_python and dump_json, str.encode("utf-8"), loguru file sinks, subprocess argv, fakeredis SET / GET, sqlite3 TEXT round-trips, csv.writer round-trips, and a realistic openai 2.x BadRequestError carrying an ANSI-colorized nvcc error plus an embedded NUL plus a Greek-letter Mojo identifier. Integration tests under tests/llm/test_sanitize_wiring.py, tests/utils/test_text_sanitize_wiring.py, tests/stages/test_sanitize_integration.py, and tests/dag/test_sanitize_integration.py exercise each modified production module with a hostile fixture combining ANSI, NUL, CR, BEL, lone surrogate, and BIDI RLO, and assert the captured loguru output contains no raw hostile bytes, encodes cleanly as UTF-8, and round-trips through json.dumps. Two defects were discovered during the sanitizer audit and fixed: the lone-surrogate regex previously used a lookahead / lookbehind that mistakenly treated adjacent independent surrogates as a valid pair (chr(0xD800) + chr(0xDC00) survived and broke UTF-8 encoding downstream); clean_identifier with a negative max_len silently dropped a trailing character via the Python slice quirk and now raises ValueError. Total new test count is approximately 760, and the full target regression suite (tests/llm, tests/utils, tests/dag, tests/test_program.py, tests/stages, tests/database, tests/evolution, tests/prompts, tests/trackers, tests/infra) passes after the changes.

…d delta fitness

…regression feature weights

… and improve docstring clarity

…r.py

…ries

…ormity refactor(memory): exception conformity + ABC base class

When write_pipeline.py passes MemoryCard/ProgramCard Pydantic models to memory_platform.save_card(), the dict() call on a Pydantic model doesn't properly flatten nested Pydantic objects like ConnectedIdea. This caused TypeError in _persist_index() when json.dumps() tried to serialize. Root cause: write_pipeline returns list[AnyCard] (Pydantic models) and both backends (memory_platform and memory/shared_memory) consume these cards via save_card(). memory_platform's normalize_memory_card() must explicitly call .model_dump() on Pydantic inputs to flatten nested objects. Fix verified: all 788 memory + integration tests pass. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Tests the exact bug path: Pydantic MemoryCard/ProgramCard with nested ConnectedIdea and MemoryCardExplanation objects must be properly flattened to plain dicts before JSON serialization. 6 tests covering: ProgramCard with ConnectedIdea, MemoryCard with MemoryCardExplanation, plain dict passthrough, JSON round-trips, None. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Add gigaevo-memory Git dependency to pyproject.toml - Remove sys.path manipulation from memory_platform/memory.py and remote_gam_retriever.py (no longer needed with proper install) - Simplify test file to use direct imports instead of module mocking Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Expands from 6 to 11 tests covering the complete save_card → _persist_index flow with Pydantic inputs. Tests verify: - normalize_memory_card: ConnectedIdea/MemoryCardExplanation → dict - save_card: Pydantic ProgramCard/MemoryCard → JSON-serializable index - _card_to_backend_content: API payload is clean dict - persist/reload roundtrip: index file survives write→read cycle Uses _make_platform_memory() factory with mocked API client to test memory_platform in isolation without network dependencies. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Add docstrings to 15 public methods across 5 files (memory.py, concept_api.py, card_dedup.py, openai_inference.py, write_pipeline.py) - Add return type annotations to 4 functions in amem_gam_retriever.py - Fix 2 mypy errors: annotate retrievers dict, rename variable in api_sync.py - Extract magic numbers: _MAX_SUMMARY_CHARS, _MAX_DESCRIPTION_CHARS, _ENTITY_NAME_MAX_LENGTH, _MAX_CONNECTED_DESCRIPTIONS Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

refactor(memory): type annotations, docstrings, constants, platform bug fix

…d database writes Introduce gigaevo/utils/text_sanitize.py with five pure str-to-str functions and wire them at the boundaries where unsanitized LLM-derived text would crash or corrupt downstream consumers. StageError gets pydantic field_validators; gigaevo/utils/json.py dumps wraps obj through deep_sanitize_for_json before serializing; MigrantEnvelope and the tracker Redis backend gain serialization belts; MultiModelRouter validates model_name via clean_identifier and redacts userinfo from base_url before logging; mutation, memory_selector, token_tracking, bandit, optuna stage, dag_runner, and prompt subsystems route their LLM-derived interpolations through sanitize_for_log. Two defects discovered during the sanitizer audit are fixed: the lone-surrogate regex previously treated adjacent independent surrogates as a valid pair (chr(0xD800) + chr(0xDC00) survived and broke UTF-8 encoding); clean_identifier with negative max_len silently dropped a trailing character via the slice quirk and now raises ValueError. Adds approximately 760 tests across the sanitizer unit suite plus adversarial Unicode, regex-bypass, and downstream-consumer suites, and integration tests under tests/llm, tests/utils, tests/dag, tests/stages that exercise each modified module with a hostile fixture combining ANSI, NUL, CR, BEL, lone surrogate, and BIDI override.

test_init_log_with_hostile_model_name previously ran the real _verify_models HTTP probe against a fake base_url and spent around seven seconds waiting for the urllib timeout. The test exists to verify the init INFO banner sanitizes model_name and redacts base_url userinfo, not to exercise the server probe (the other two tests in the same class already do that with monkeypatched urlopen). Patching _verify_models to a no-op brings the test from 7.4 seconds down to under ten milliseconds.

…anitization _apply_modifications cleaned ParamSpec.name through clean_identifier but inserted the parameterized_snippet unchanged, so a snippet that originally referenced _optuna_params['old hostile name'] failed with KeyError once the trial dict only carried the cleaned name. Build a name_map alongside the cleaning loop, dedupe collisions so two distinct hostile names that clean to the same identifier do not collapse into one parameter, and rewrite every string-literal _optuna_params[...] reference inside every snippet before splicing. Tests now positively exec the rewritten code against a synthetic _optuna_params dict instead of swallowing exceptions.

…aths flush() stored latest values under clean_identifier(tag), but get_latest, get_history, and clear_series read with the raw caller tag. A write_scalar('loss train', ...) was stored as 'losstrain' and get_latest('loss train') returned an empty dict. Extract a single _field_tag() helper applied symmetrically on every read and write side; fall back to metric_<sha256[:12]> when sanitization yields an empty tag so distinct hostile inputs do not collide. New round-trip tests cover both clean and hostile tags through write_scalar -> flush -> get_latest / list_metrics / get_history.

prompt_text_to_id called blob.encode() on the raw return value of the prompt entrypoint, raising UnicodeEncodeError when the LLM-generated prompt source synthesized a lone UTF-16 surrogate via an escape literal. The error aborted the prompt fitness pipeline. Apply sanitize_for_log inside prompt_text_to_id and at the two entrypoint call sites so the stored prompt text and the hashed prompt text always match. Clean prompts keep their historical sha256[:16] id unchanged (test_clean_system_only_hash_matches_previous_sha256 proves no migration breakage). Empty-after-sanitization input becomes an explicit error in the prompt execution stage and a skip with warning in the archive fetcher rather than hashing an empty blob.

parse_response computed safe_msg through sanitize_for_log for the loguru line and for state['error'] but the value placed in parsed_output['error'] still came from str(e). Downstream consumers that read parsed_output directly (run-state dumps, langfuse traces, re-injection into LLM prompts) now also see the sanitized form.

The structured-output schemas receive LLM responses verbatim. ANSI escapes, NUL bytes, lone UTF-16 surrogates, and BIDI overrides flow through bare str fields into reports, JSON dumps, Postgres TEXT columns, and re-injection back into LLM prompts. Match the pattern already applied to MutationStructuredOutput: pydantic field_validators on ProgramInsight.{type, insight, tag, severity} and TransitionInsight.{strategy, description} pipe each value through sanitize_for_log at the schema layer. ProgramScore has no str fields, no change. Tests cover the clean-input identity path, ANSI stripping, lone-surrogate replacement, CR removal, and preservation of legitimate Unicode (arrows, math symbols, template syntax).

…m_fields" This reverts commit 7fbf680. The defence is now redundant against PR FusionBrainLab#10 (fix/llm-output-sanitization), which introduces ``gigaevo.utils.text_sanitize.deep_sanitize_for_json`` and applies it at the same migration-bus boundary with broader coverage (ANSI / BIDI / control characters, not only surrogates). Keeping the local ``_scrub_str`` / ``_scrub_surrogates`` helpers here forces a merge conflict with FusionBrainLab#10 regardless of merge order; removing them lets either PR merge independently against ``main``. If FusionBrainLab#10 is rejected, this commit should be reverted (or the original content cherry-picked back) to restore the defence on the bus.

… fields" This reverts commit a25678f. The defence is now redundant against PR FusionBrainLab#10 (fix/llm-output-sanitization), which adds ``sanitize_for_log`` field validators on ``StageError`` (covering ANSI / BIDI / control characters / surrogates). Keeping the local ``_scrub_surrogates`` validators here forces a merge conflict with FusionBrainLab#10 regardless of merge order; removing them lets either PR merge independently against ``main``. If FusionBrainLab#10 is rejected, this commit should be reverted (or the original content cherry-picked back) to restore the defence on ``StageError``.

GrigoryEvko · 2026-05-16T02:39:38Z

Conflict map for cross-cutting overlaps with other open PRs

This PR introduces gigaevo.utils.text_sanitize and applies sanitize_for_log / deep_sanitize_for_json / field validators across ~22 files. Several open PRs touch the same lines for orthogonal reasons. The table below documents the residual overlap after pre-emptive decoupling commits were pushed where surgical decoupling was possible.

Other PR	Files	Hunks	Notes
#13 `fix/llm-call-outcome-classifier`	`gigaevo/llm/bandit.py`	1	Disjoint imports — clean union. Two duplicate-work commits on #13 (local `_scrub_str` / `_scrub_surrogates` in `transport.py` + `StageError` validators in `core_types.py`) were pre-emptively reverted on that branch since this PR's centralised helpers supersede them.
#14 `loky-executor`	`gigaevo/programs/stages/python_executors/execution.py`	3	#14 removes the memory-detection block; this PR adds `sanitize_for_log` to the adjacent log args. Resolution: take #14's structure + this PR's wrapping.
#15 `fix/error-context-preservation`	`dag_runner.py` (11), `optimization/optuna/stage.py` (2), `optimization/utils.py` (1), `validation.py` (1)	15	#15 converts `logger.error → logger.exception` and restructures error-construction sites that this PR also sanitises. For dag_runner: take #15's `logger.exception` (loguru auto-escapes the attached traceback). For the other 3 files: union — keep #15's structural fixes (full stderr, structural `SyntaxError`) + re-apply `sanitize_for_log` to interpolated strings. One `wrapper.py` overlap on #15 was pre-emptively reverted on that branch (it was dead code post-#14 anyway).
#16 `fix/pipeline-builder-hygiene`	—	0	No overlap.
#17 `fix/aiohttp-network-layer`	`gigaevo/llm/models.py`	1	#17 restructures `_verify_models` (TTL cache + jitter + auth); this PR sanitises the warning args inside the same loop. Resolution: take #17's restructure + this PR's `sanitize_for_log` on the warning.

Merge order: all overlaps are intrinsic line-level edits (different concerns at same sites), not duplicate work. Order is interchangeable; the second-merged PR needs the small line-level union shown in each PR's own conflict comment. Pre-emptive decoupling pushed where the overlap was duplicate work (#13 × 2 reverts, #15 × 1 revert) — those are documented in each branch's revert commits.

GrigoryEvko · 2026-05-18T13:47:42Z

Two integration concerns not in the conflict map above, surfaced while reconciling #10 + #14 + #20 on a nightly branch:

vs #14: tests/stages/test_sanitize_integration.py:356 passes max_memory_mb=None to evaluate_single. #14 drops that parameter (replaced by disk spilling — see 0ec8b88). After both merge: TypeError: evaluate_single() got an unexpected keyword argument 'max_memory_mb'. Whichever lands second: drop the kwarg from the test.

vs #20 (feat/dataplane-foundation): #20 renames RedisPromptStatsProvider._get_redis → _get_dataplane and replaces gigaevo.prompts.fetcher._get_sync_redis with a DataPlane handle. Test-side monkeypatches added by this PR break:

tests/utils/test_text_sanitize_wiring.py: retarget patches at _get_dataplane; the archive-read path uses raw_hash_values (replaces the old hvals).
tests/llm/test_sanitize_wiring.py: retarget urllib.request.urlopen patch at _fetch_available_models_at — that's the per-base-URL helper after fix(infra): replace httpx with aiohttp + requests for long-running asyncio stability #17 lands.

Verified locally by integrating all three PRs into a single branch — full suite (5862 tests) passes after the test-side adjustments. Parallel notes left on #14 and #20.

…STEM_CONSTRAINTS (Amendment #10) Evolved mutation prompts were hallucinating about tuning BM25 parameters when BM25 is actually a fixed retrieval tool. Root cause: meta-evolution LLM lacked knowledge of the downstream chain structure. **Changes**: - Rewrote task_description.txt with full HotpotQA chain topology, domain constraints (BM25 fixed, step 3 VERBATIM, system_prompt cost), and 4 explicit contracts for evolved prompts - Removed SYSTEM_CONSTRAINTS module import + validation from all seed programs, pipeline configs, and fetcher/stage implementations. Constraints now injected via {task_description} placeholder - Updated all seed programs (generic/hotpotqa/minimal/generalization) to use framework placeholders instead of external constraints - Fixed watchdog to show correct P1 fitness (query archive directly instead of valid_frontier_fitness) and plot only P1 (removed P2/DB7) - Updated run-tests skill to exclude benchmarks **Rationale**: Task description is the right mechanism for injecting domain knowledge. Validation via required_prefix was a hard constraint that prevented legitimate mutations. All main runs inherit benefit of improved meta-prompt quality without changes to their own logic. **No confound**: Meta-prompt quality improvement applies uniformly to P1. Main run data (DBs 4/5/8) preserved; P1 archive (DB 6) discarded for relaunch. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…rite

ShiftingBorders and others added 30 commits April 1, 2026 16:00

feat: add best idea extraction based on top_k selection by fitness an…

2d2c1e6

…d delta fitness

feat: experimental ml pipeline for impact estimation based on linear …

1872d38

…regression feature weights

refactor: remove debug code

c94e78a

fix: changed cooccurrence threshold agressive scaling to fixed minimum

97ad639

feat: add idea description rewriting logic

c6c3421

chore: removed unused prompts

5170fe9

gitignore

3293d4d

fix: correct serialization of dict and lists in pd columns

7c7e3e8

feat: csv loading to IdeaTracker

db5d7c4

memory in config

ce2ca3d

fixed my cat stepping on keyboard probably

43def44

Update idea_tracker

9fd9cf1

feat: add extended record card dataclass

6ea7bd9

feat: add update logic for extended record card

bec4351

feat: support for extended record card

49a2a00

refactor: record card extended minor refactor

4830094

feat: task description loading

1662eed

fix: remove debug print

7c8d45a

feat: update main logic to work with extended record card

78584c7

chore: update docstrings

97f96d1

fix: wrong key name fix

3d1d740

fix: IncomingIdeas update logic fix

2402efb

refactor: replace ML impact pipeline with origin analysis computation…

ba95f51

… and improve docstring clarity

fix: add break condition for processing when no new ideas are present

4974f97

feat: add idea origin analysis script and minor refactor ideas_tracke…

38d2b78

…r.py

feat: implement idea enrichment with LLM-generated keywords and summa…

cceb8a0

…ries

fix: handle parent_ids as string in ideas_tracker

323c0e6

update to full memory pipeline

f2def62

add ideas used from memory tracking

bac06b4

qol features and llm inference for memory

8258ef2

KhrulkovV and others added 17 commits April 6, 2026 14:40

Merge pull request #174 from KhrulkovV/refactor/memory-exception-conf…

407b757

…ormity refactor(memory): exception conformity + ABC base class

fix: lint errors in memory_platform test (unused import, sort)

b99e221

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Merge pull request #175 from KhrulkovV/refactor/memory-mypy-docs

4f249d3

refactor(memory): type annotations, docstrings, constants, platform bug fix

Update pyproject.toml

98bd0ab

Update run.py

054df39

GrigoryEvko mentioned this pull request May 17, 2026

feat(dataplane): typed Redis coordination plane with atomic Lua substrate, FSM migration, and Hydra hardening #20

Open

GrigoryEvko mentioned this pull request May 21, 2026

refactor(config): replace hydra/omegaconf with typed pydantic+tyro #21

Open

10 tasks

KhrulkovV force-pushed the main branch from 054df39 to 0f2b866 Compare May 26, 2026 09:37

chore: empty commit to refresh PR mergeability after main history rew…

ffcebaf

…rite

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: sanitize LLM-derived text before logging, JSON serialization, and database writes#10

fix: sanitize LLM-derived text before logging, JSON serialization, and database writes#10
GrigoryEvko wants to merge 806 commits into
FusionBrainLab:mainfrom
GrigoryEvko:fix/llm-output-sanitization

GrigoryEvko commented May 15, 2026 •

edited

Loading

Uh oh!

GrigoryEvko commented May 16, 2026

Uh oh!

GrigoryEvko commented May 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Uh oh!

Conversation

GrigoryEvko commented May 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

GrigoryEvko commented May 16, 2026

Uh oh!

GrigoryEvko commented May 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

GrigoryEvko commented May 15, 2026 •

edited

Loading