Harden multilingual decoder verification and restore follow-up test coverage#82
Merged
Conversation
Stage the 35 test files that were split out of the code PR (commit 5e02cfc) on top of the latest pipeline code, including the review-fix commits. Basing this branch on the current pipeline HEAD keeps the eventual follow-up PR's diff limited to the test files once the code PR has merged to main.
Non-Python interface units are LPCodeUnit instances, which have no count_lines method. The except branch in interface synthesis then stores the whole interface block as every unit's code, so interfaces.json's file_code embeds the entire file once per unit. On large modules that O(units x file_size) blow-up pushes the plan_tasks prompt past the 128 KB single-argument limit, crashing the planner with "Argument list too long" and producing an incomplete tasks.json. Collapse identical per-unit blocks before building the planner prompt. Keeping one copy reconstructs the original complete file (imports plus each unit once), so the planner sees valid source while distinct per-unit slices are preserved. Measured 55-67% reduction on a real Rust subtree.
dominant_language votes one ballot per file and detect_language maps every .h to C (the only config owning .h). A C++ repo that uses .h headers — googletest has 2018 .h vs 1062 .cc — therefore gets more C votes than C++ and is misclassified as C, which fails the encoder's dominant_language expectation and poisons every downstream language decision (backend, test_command, entry_point). Fold C votes into C++ whenever both appear: a pure C repo never carries .cc/.cpp/.hpp sources, so the presence of any C++-only extension means the repo is C++ and its .h files are C++ headers. Pure C repos (only .c/.h) and C mixed with unrelated languages are unaffected.
The planner-side dedup (1f22944) only repaired the prompt; interfaces.json itself still stored file_code as the whole file repeated once per unit, and context_collector writes that file_code straight to disk as the code-gen seed source — so generated repos were seeded with N duplicate definitions. Add a shared common/code_dedup helper and apply it at every point that rebuilds file_code from per-unit code: InterfacesStore.to_interfaces_json (serialization) and interface_review prune (regeneration). The planner helper now reuses it as a consumer-side safety net for older artifacts. Keeping one copy reconstructs the original single file (imports plus each unit once); units_to_code is left untouched so per-unit stubs stay valid.
Exclude backslashes from the normal string-literal branch so escaped characters have one matching path. This prevents zod-like commented regex literals from hanging TypeScript encoding.
Allow Python add_interface fixes to materialize a new interface file when the feature root maps to an existing subtree. Add deterministic same-file Python invocation edges for self calls, private helper fan-out, and constructor-composed class dependencies so review orphan checks can converge.
Run C++ ctest with --test-dir build so post-verification sees the CMake-generated test registry. Also make plan_tasks reject interfaces.json files that do not cover skeleton.json features, and update the plan command template so warning states are treated as incomplete.
Keep interface generation progress in a .partial file until complete, add design_interfaces heartbeat output, exclude skipped directories from C/C++ syntax source discovery, and make C/C++ prompt test commands discover sources at run time with standalone translation-unit guidance.
Keep interface generation progress in a .partial file until complete, add design_interfaces heartbeat output, exclude skipped directories from C/C++ syntax source discovery, make C/C++ prompt test commands discover sources at run time, and document the syntax-only success summary fallback.
Run cmake --build build during C/C++ test environment preparation so ctest --test-dir build can execute generated test binaries instead of failing on missing executables.
Add a shared generated-artifact policy for prompt guidance, local git excludes, post-verification, and merge-time checks. Align CMake prompt verification with configure, build, and ctest so agents repair source and build configuration instead of generated build files.
Run post-verification for documentation batches so README and docs edits cannot bypass native test suites. Treat smoke validation failures and C/C++ no-op make test output as validation failures. Narrow generated artifact matching to root build directories so source paths such as configs/build remain valid.
Contributor
There was a problem hiding this comment.
Pull request overview
This PR is a follow-up hardening pass on the multilingual decoder/codegen pipeline, primarily tightening verification correctness (especially for C/C++), improving generated-artifact hygiene, strengthening interface/planner robustness, and restoring/adding extensive regression coverage across the multilingual stack.
Changes:
- Harden C/C++ verification semantics (build before
ctest, detect “compile-only”make test, skip build/generated dirs when collecting sources) and improve prompt rules/summaries for compiled backends. - Add shared helpers for interface-source deduplication and generated-artifact hygiene, and enforce generated-artifact rejection at key pipeline gates (run_batch startup, post_verify, merge).
- Restore/add broad multilingual regression tests covering language parsing, repo language resolution, entry-point reconciliation, smoke/final validation behavior, and zero-test guarding.
Reviewed changes
Copilot reviewed 55 out of 56 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| CoderMind/tests/test_zero_test_guard.py | Adds regression coverage for “exit-0 but no tests ran” detection across backends. |
| CoderMind/tests/test_smoke_multilang.py | Adds smoke-test coverage for clean env execution and skipping Python-only layers for non-Python repos. |
| CoderMind/tests/test_rpg_builder.py | Ensures RPG builder preserves target language metadata. |
| CoderMind/tests/test_repo_language_resolution.py | Adds tests for on-disk language resolution fallback and wrapper behavior. |
| CoderMind/tests/test_plan_prompt_dedup.py | Covers planner prompt dedup + interface/skeleton coverage validation behavior. |
| CoderMind/tests/test_orphan_test_build_exclusion.py | Adds regression coverage for excluding test/build units from orphan detection. |
| CoderMind/tests/test_multilingual_prompt_safety.py | Guards encoder prompts against Python-only wording regressions. |
| CoderMind/tests/test_multilingual_encoder_pipeline.py | Tests multilingual encoder discovery + semantic parsing entry behavior. |
| CoderMind/tests/test_multilingual_code_unit.py | Tests multilingual ParsedFile and snippet fencing behavior. |
| CoderMind/tests/test_lang_parser_typescript.py | Adds TypeScript parser tests including hang regression coverage. |
| CoderMind/tests/test_lang_parser_rust.py | Adds Rust parser unit/dependency/invocation behavior tests. |
| CoderMind/tests/test_lang_parser_registry.py | Expands registry tests (configs, dominant language, fences, API). |
| CoderMind/tests/test_lang_parser_python_parity.py | Ensures Python parser/ParsedFile parity with legacy AST behavior. |
| CoderMind/tests/test_lang_parser_javascript.py | Adds JavaScript parser tests. |
| CoderMind/tests/test_lang_parser_go.py | Adds Go parser tests (units, deps, invocation extraction). |
| CoderMind/tests/test_lang_parser_fallback.py | Adds fallback delimiter/syntax scanner regression tests. |
| CoderMind/tests/test_lang_parser_cpp.py | Adds C++ parser tests (units, invokes, syntax errors). |
| CoderMind/tests/test_lang_parser_c.py | Adds C parser tests (includes, invokes, syntax errors). |
| CoderMind/tests/test_interface_source_dedup.py | Tests shared interface-source dedup and serialization wiring. |
| CoderMind/tests/test_init_codebase_gitignore.py | Tests .gitignore update behavior for dev env entries. |
| CoderMind/tests/test_generated_artifact_hygiene.py | Adds end-to-end tests for generated-artifact exclude/install + rejection gates. |
| CoderMind/tests/test_final_test_repair.py | Tests bounded final-test repair loop + smoke failure propagation. |
| CoderMind/tests/test_feature_build.py | Tests feature-build tree promotion + target language preservation. |
| CoderMind/tests/test_entry_reconciliation.py | Adds cross-language entry-point reconciliation protocol coverage. |
| CoderMind/tests/test_code_gen_multilingual.py | Adds broad multilingual codegen prompt/test runner/static check coverage. |
| CoderMind/tests/test_branch_name_sanitization.py | Tests branch name sanitization hardening and shared sanitizer usage. |
| CoderMind/templates/commands/plan.md | Clarifies warning is “not done” in plan --check-only semantics. |
| CoderMind/scripts/run_batch.py | Installs local generated-artifact excludes during batch startup. |
| CoderMind/scripts/plan_tasks.py | Adds interface source dedup in planner prompts + validates interface coverage vs skeleton. |
| CoderMind/scripts/lang_parser/registry.py | Fixes dominant language classification for header-heavy mixed C/C++ repos. |
| CoderMind/scripts/lang_parser/extractors/fallback.py | Hardens string-literal stripping regex against catastrophic backtracking. |
| CoderMind/scripts/func_design/interfaces_store.py | Deduplicates file_code during interfaces serialization. |
| CoderMind/scripts/func_design/interface_review.py | Adds file-entry scaffolding on interface review + uses shared dedup when regenerating file_code. |
| CoderMind/scripts/func_design/interface_agent.py | Improves Python dependency collection (same-file calls, self.method() calls) and saves partial interfaces output. |
| CoderMind/scripts/design_interfaces.py | Adds a periodic heartbeat while long interface design runs. |
| CoderMind/scripts/decoder_lang/tests/test_unit_kind.py | Adds tests for shared unit-kind classification and backend callable/type behavior. |
| CoderMind/scripts/decoder_lang/tests/test_python_backend.py | Expands Python backend/registry protocol invariants and behavior tests. |
| CoderMind/scripts/decoder_lang/tests/test_phase5_prompt_directive.py | Tests language directive preamble behavior (no-op for Python). |
| CoderMind/scripts/decoder_lang/tests/test_phase3_code_structure.py | Tests backend code-structure helpers and cross-language behavior. |
| CoderMind/scripts/decoder_lang/tests/test_phase2_skeleton.py | Tests backend-aware skeleton behavior and identifier validation. |
| CoderMind/scripts/decoder_lang/tests/test_phase1_propagation.py | Tests language propagation/resolution through decoder entry points. |
| CoderMind/scripts/decoder_lang/tests/test_javascript_backend.py | Adds JavaScript backend contract tests. |
| CoderMind/scripts/decoder_lang/tests/test_c_cpp_backend.py | Adds C/C++ backend tests for build + ctest semantics and “compile-only make test” detection. |
| CoderMind/scripts/decoder_lang/tests/init.py | Package marker for decoder_lang tests. |
| CoderMind/scripts/decoder_lang/cpp_backend.py | Skips build/generated dirs in source collection + strengthens no-test/compile-only detection. |
| CoderMind/scripts/decoder_lang/c_backend.py | Skips build/generated dirs in source collection + strengthens no-test/compile-only detection. |
| CoderMind/scripts/decoder_lang/backend.py | Ensures CMake projects are built before ctest in C/C++ verification prep. |
| CoderMind/scripts/common/generated_artifacts.py | Adds shared generated-artifact classifier, local exclude installer, and persisted-change detection. |
| CoderMind/scripts/common/code_dedup.py | Adds shared helpers for deduplicating repeated interface code blocks. |
| CoderMind/scripts/code_gen/post_verify.py | Enforces generated-artifact rejection before and after post-verify test runs. |
| CoderMind/scripts/code_gen/git_ops.py | Rejects merging batch branches containing persisted generated artifacts. |
| CoderMind/scripts/code_gen/final_validation.py | Propagates smoke-test failures into final validation outcome. |
| CoderMind/scripts/code_gen/batch_prompts.py | Adds generated-artifact prompt rules and improves C/C++ verification command construction in prompts. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Keep shell expansion for C/C++ syntax-check include paths and root-anchor virtualenv artifact directories to avoid false positives. Also cover the false-positive resume prompt summary interpolation with a regression test.
prune_orphan_interfaces is a deprecated helper with no production callers; the active design flow uses InterfacesStore.find_orphan_units and prune_units, and file_code is already deduplicated at the serialization source. Restore the plain join so the legacy helper no longer carries duplicate dedup logic, and drop the now-unused dedup_file_code import.
The re-added follow-up tests encoded pre-refactor behavior that no longer matches the merged decoder pipeline: - add_interface for a non-Python project is recorded as an advisory manual follow-up rather than silently skipped. - correct_intra_subtree_file_order reports reason "backend_file_dependencies", and Go ordering resolves module imports through go.mod. - Go command-path resolution moved to go_backend.find_existing_entry. Update the assertions to match, and remove the standalone C++ ctest backend test, whose behavior is fully covered by the comprehensive decoder_lang backend suite.
Second Copilot review pass flagged three correctness issues in the restored hardening code: - code_dedup: dedup_code_blocks returned the stripped block, dropping leading indentation from indented unit slices and corrupting the file_code used as a codegen seed. Dedup on the stripped key but keep the block's own indentation (trim only trailing whitespace). - interface_agent: _analyze_python_invocations walked the whole AST, attributing calls inside nested def/class bodies to the enclosing caller. Walk only the caller's own scope. - decoder_lang.backend: cmake_reconfigure now skips the build step when configure fails so ctest surfaces the real error instead of running against a stale build directory.
A second Copilot review pass surfaced three correctness gaps in the restored C/C++ verification and Python invocation analysis: - c/cpp backend: the compile-only `make test` guard regex did not match version-suffixed compilers (gcc-13, clang-18, g++-13, clang++-18, c++-14), so a compile-only run could be reported as a passing test run. Allow an optional version suffix. - batch_prompts: the C/C++ syntax-only find now also prunes dist/coverage/.venv/venv/CMakeFiles so generated sources are not pulled into the syntax check. - interface_agent: lambdas are not separate units, so keep their bodies in the enclosing caller's scope rather than dropping their same-file calls (nested def/class scopes are still excluded).
Copilot stopped work on behalf of
HuYaSen due to an error
June 26, 2026 10:26
HuYaSen
approved these changes
Jun 26, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Follow-up to #67. This PR hardens the multilingual decoder/codegen pipeline by tightening interface completeness checks, generated-artifact hygiene, C/C++ verification semantics, and planner prompt safety. It also restores the extracted multilingual regression tests that were moved out of the original #67 branch for a follow-up PR.
What changed
Decoder and verification hardening
ctest, so verification does not falsely pass or fail because test executables were never built.make testtargets that only compile objects, without actually running tests, as verification errors instead of successful test runs.Generated artifact hygiene
.git/info/excludehygiene rules during batch startup.Interface and planner robustness
file_codeblocks ininterfaces.jsonbefore serialization and before planner prompt construction.plan_tasksfails fast wheninterfaces.jsondoes not cover all skeleton features.self.method()invocations.interfaces.json.partialand only overwrite the canonicalinterfaces.jsonafter successful completion.Parser and language detection fixes
Final validation behavior
plan --check-onlywarning states are not complete/done states and should not allow downstream stages to proceed.Test coverage
This PR adds extensive regression coverage for the multilingual pipeline, including:
The diff adds 33 new test files and restores the extracted multilingual tests intended for this follow-up PR.
Notes for reviewers
run_batch/post_verifynow update.git/info/excludewith local generated-artifact exclusions. This is intentionally local-only and non-destructive.plan_tasksnow fails on incomplete interface coverage instead of silently planning from stale or partial interfaces.make testtarget only compiles objects but does not execute tests will now be rejected as invalid verification results.Testing
Not run as part of this PR-description preparation.