feat: implement audit findings — security, perf, and feature wins#7
Merged
Conversation
Lands a focused batch of improvements identified in the senior-staff audit: Security - New NETLOGO_MCP_RESTRICTED env switch blocks dangerous NetLogo primitives (file-*, import/export-world, set-current-directory, user-*, sh:/py:/web:) when opted in; off by default to preserve the unrestricted core flow. - README now ships an explicit Security Model section documenting the trust boundary (any connected MCP client gets host-level file/network I/O via NetLogo primitives). - Switched bspace.parse_experiments to defusedxml — model files may originate from CoMSES (untrusted), so guard against entity-expansion attacks. - Dropped shell=True from the BehaviorSpace launcher; .bat is now invoked via "cmd.exe /c <bat>" with shell=False. Hand-rolled _quote_for_cmd is gone. CoMSES-supplied experiment names are sanitized to a safe argv charset before they hit the child process. Concurrency / performance - lifespan now holds an asyncio.Lock; every JVM-touching tool routes through _jvm_call(), which acquires the lock and runs the blocking pynetlogo call in asyncio.to_thread. Fixes both the workspace race and event-loop blocking with one helper. - _require_model trusts current_model_path on the fast path, so most tool calls no longer pay a per-call "max-pxcor" JVM round-trip. - get_world_state batches its 8 nl.report calls into a single locked thread hop instead of 8 separate JNI dispatches. - get_patch_data accepts max_cells (default 10000) and downsamples large worlds with a _meta block explaining the trim. - export_view / export_world prune oldest files via NETLOGO_EXPORTS_MAX_FILES (default 200) so long-lived sessions don't fill the disk. Features / correctness - New get_agent_sample tool fills the gap between get_world_state (aggregates only) and hand-crafted reporters: batched single-roundtrip reporter, identifier-validated breed/attributes, markdown-table output with "Sampled N of M" footer. - _wrap_nlogox now reads the live netlogo-version captured at lifespan start instead of hardcoding "NetLogo 7.0.3". - set_parameter rejects nan/inf up front — NetLogo has no such literal, and the wrapped compiler error was previously cryptic. - CoMSES download persists title/license in a .comses_metadata.json sidecar so cache hits stay informative; the sidecar is filtered out of read_comses_files responses. - Fixed .env.example NETLOGO_GUI default comment (was "false", actual default is "true"). CI / tooling - ci.yml now runs "pre-commit run --all-files" instead of duplicate ruff/format steps — single source of truth with .pre-commit-config.yaml. - Added defusedxml>=0.7 to project dependencies. Tests - New tests cover nan/inf rejection, get_agent_sample validation and rendering, _sanitize_experiment_name behavior. Total: 188 tests pass. Verified: ruff check, ruff format, mypy, and pytest all clean.
2 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Lands a focused batch of improvements identified in the senior-staff audit:
Security
Concurrency / performance
Features / correctness
CI / tooling
Tests
Verified: ruff check, ruff format, mypy, and pytest all clean.