Skip to content

Harden multilingual decoder verification and restore follow-up test coverage#82

Merged
HuYaSen merged 17 commits into
mainfrom
dev/decoder-multilang-pipeline-fix2
Jun 26, 2026
Merged

Harden multilingual decoder verification and restore follow-up test coverage#82
HuYaSen merged 17 commits into
mainfrom
dev/decoder-multilang-pipeline-fix2

Conversation

@QingtaoLi1

Copy link
Copy Markdown
Contributor

Summary

Follow-up to #67. This PR hardens the multilingual decoder/codegen pipeline by tightening interface completeness checks, generated-artifact hygiene, C/C++ verification semantics, and planner prompt safety. It also restores the extracted multilingual regression tests that were moved out of the original #67 branch for a follow-up PR.

What changed

Decoder and verification hardening

  • Build C/C++ CMake targets before running ctest, so verification does not falsely pass or fail because test executables were never built.
  • Treat C/C++ make test targets that only compile objects, without actually running tests, as verification errors instead of successful test runs.
  • Skip generated/build/cache directories when collecting C/C++ source files for syntax and verification commands.
  • Improve C/C++ prompt rules to:
    • avoid editing build/cache/generated artifacts,
    • use full CMake build + CTest commands,
    • avoid relying on undeclared/transitively included helper functions,
    • report explicit syntax-check summaries.

Generated artifact hygiene

  • Add a shared generated-artifact classifier and prompt rule helper.
  • Install local .git/info/exclude hygiene rules during batch startup.
  • Reject persisted generated artifacts before post-verify and after verification runs.
  • Prevent batch branches containing generated artifacts from being merged.

Interface and planner robustness

  • Deduplicate repeated whole-file file_code blocks in interfaces.json before serialization and before planner prompt construction.
  • Add interface coverage validation so plan_tasks fails fast when interfaces.json does not cover all skeleton features.
  • Improve Python dependency collection for same-file calls and self.method() invocations.
  • Save in-progress interface generation to interfaces.json.partial and only overwrite the canonical interfaces.json after successful completion.
  • Allow interface review additions to scaffold missing file entries under existing feature subtrees.

Parser and language detection fixes

  • Classify header-heavy mixed C/C++ repositories as C++ when C and C++ votes appear together.
  • Harden fallback string-literal stripping against catastrophic regex backtracking on unterminated escaped strings.

Final validation behavior

  • Propagate smoke-test failures into the final validation result instead of allowing a successful unit-test result to mask a failed smoke check.
  • Clarify that plan --check-only warning states are not complete/done states and should not allow downstream stages to proceed.

Test coverage

This PR adds extensive regression coverage for the multilingual pipeline, including:

  • generated artifact hygiene,
  • interface source deduplication,
  • skeleton/interface coverage validation,
  • multilingual dependency graph behavior,
  • multilingual encoder/codegen behavior,
  • planner language support and prompt deduplication,
  • C/C++/Go/Rust/TypeScript/JavaScript/Python parser behavior,
  • decoder language backends and planning phases,
  • zero-test guard behavior,
  • final test repair,
  • repo language resolution,
  • orphan/test/build exclusion handling,
  • smoke multilingual coverage.

The diff adds 33 new test files and restores the extracted multilingual tests intended for this follow-up PR.

Notes for reviewers

  • This branch was rebased on top of the squash-merged feat: multi-language support for the encoder and decoder pipelines #67 commit, so the PR diff should now represent only the follow-up hardening and restored tests.
  • run_batch / post_verify now update .git/info/exclude with local generated-artifact exclusions. This is intentionally local-only and non-destructive.
  • plan_tasks now fails on incomplete interface coverage instead of silently planning from stale or partial interfaces.
  • C/C++ projects whose make test target only compiles objects but does not execute tests will now be rejected as invalid verification results.

Testing

Not run as part of this PR-description preparation.

HuYaSen added 12 commits June 25, 2026 10:48
Stage the 35 test files that were split out of the code PR (commit
5e02cfc) on top of the latest pipeline code, including the review-fix
commits. Basing this branch on the current pipeline HEAD keeps the
eventual follow-up PR's diff limited to the test files once the code
PR has merged to main.
Non-Python interface units are LPCodeUnit instances, which have no
count_lines method. The except branch in interface synthesis then stores
the whole interface block as every unit's code, so interfaces.json's
file_code embeds the entire file once per unit. On large modules that
O(units x file_size) blow-up pushes the plan_tasks prompt past the 128 KB
single-argument limit, crashing the planner with "Argument list too long"
and producing an incomplete tasks.json.

Collapse identical per-unit blocks before building the planner prompt.
Keeping one copy reconstructs the original complete file (imports plus
each unit once), so the planner sees valid source while distinct per-unit
slices are preserved. Measured 55-67% reduction on a real Rust subtree.
dominant_language votes one ballot per file and detect_language maps
every .h to C (the only config owning .h). A C++ repo that uses .h
headers — googletest has 2018 .h vs 1062 .cc — therefore gets more C
votes than C++ and is misclassified as C, which fails the encoder's
dominant_language expectation and poisons every downstream language
decision (backend, test_command, entry_point).

Fold C votes into C++ whenever both appear: a pure C repo never carries
.cc/.cpp/.hpp sources, so the presence of any C++-only extension means
the repo is C++ and its .h files are C++ headers. Pure C repos (only
.c/.h) and C mixed with unrelated languages are unaffected.
The planner-side dedup (1f22944) only repaired the prompt; interfaces.json
itself still stored file_code as the whole file repeated once per unit, and
context_collector writes that file_code straight to disk as the code-gen
seed source — so generated repos were seeded with N duplicate definitions.

Add a shared common/code_dedup helper and apply it at every point that
rebuilds file_code from per-unit code: InterfacesStore.to_interfaces_json
(serialization) and interface_review prune (regeneration). The planner
helper now reuses it as a consumer-side safety net for older artifacts.
Keeping one copy reconstructs the original single file (imports plus each
unit once); units_to_code is left untouched so per-unit stubs stay valid.
Exclude backslashes from the normal string-literal branch so escaped characters have one matching path. This prevents zod-like commented regex literals from hanging TypeScript encoding.
Allow Python add_interface fixes to materialize a new interface file when the feature root maps to an existing subtree. Add deterministic same-file Python invocation edges for self calls, private helper fan-out, and constructor-composed class dependencies so review orphan checks can converge.
Run C++ ctest with --test-dir build so post-verification sees the CMake-generated test registry. Also make plan_tasks reject interfaces.json files that do not cover skeleton.json features, and update the plan command template so warning states are treated as incomplete.
Keep interface generation progress in a .partial file until complete, add design_interfaces heartbeat output, exclude skipped directories from C/C++ syntax source discovery, and make C/C++ prompt test commands discover sources at run time with standalone translation-unit guidance.
Keep interface generation progress in a .partial file until complete, add design_interfaces heartbeat output, exclude skipped directories from C/C++ syntax source discovery, make C/C++ prompt test commands discover sources at run time, and document the syntax-only success summary fallback.
Run cmake --build build during C/C++ test environment preparation so ctest --test-dir build can execute generated test binaries instead of failing on missing executables.
Add a shared generated-artifact policy for prompt guidance, local git excludes, post-verification, and merge-time checks.

Align CMake prompt verification with configure, build, and ctest so agents repair source and build configuration instead of generated build files.
Run post-verification for documentation batches so README and docs edits cannot bypass native test suites.

Treat smoke validation failures and C/C++ no-op make test output as validation failures. Narrow generated artifact matching to root build directories so source paths such as configs/build remain valid.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR is a follow-up hardening pass on the multilingual decoder/codegen pipeline, primarily tightening verification correctness (especially for C/C++), improving generated-artifact hygiene, strengthening interface/planner robustness, and restoring/adding extensive regression coverage across the multilingual stack.

Changes:

  • Harden C/C++ verification semantics (build before ctest, detect “compile-only” make test, skip build/generated dirs when collecting sources) and improve prompt rules/summaries for compiled backends.
  • Add shared helpers for interface-source deduplication and generated-artifact hygiene, and enforce generated-artifact rejection at key pipeline gates (run_batch startup, post_verify, merge).
  • Restore/add broad multilingual regression tests covering language parsing, repo language resolution, entry-point reconciliation, smoke/final validation behavior, and zero-test guarding.

Reviewed changes

Copilot reviewed 55 out of 56 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
CoderMind/tests/test_zero_test_guard.py Adds regression coverage for “exit-0 but no tests ran” detection across backends.
CoderMind/tests/test_smoke_multilang.py Adds smoke-test coverage for clean env execution and skipping Python-only layers for non-Python repos.
CoderMind/tests/test_rpg_builder.py Ensures RPG builder preserves target language metadata.
CoderMind/tests/test_repo_language_resolution.py Adds tests for on-disk language resolution fallback and wrapper behavior.
CoderMind/tests/test_plan_prompt_dedup.py Covers planner prompt dedup + interface/skeleton coverage validation behavior.
CoderMind/tests/test_orphan_test_build_exclusion.py Adds regression coverage for excluding test/build units from orphan detection.
CoderMind/tests/test_multilingual_prompt_safety.py Guards encoder prompts against Python-only wording regressions.
CoderMind/tests/test_multilingual_encoder_pipeline.py Tests multilingual encoder discovery + semantic parsing entry behavior.
CoderMind/tests/test_multilingual_code_unit.py Tests multilingual ParsedFile and snippet fencing behavior.
CoderMind/tests/test_lang_parser_typescript.py Adds TypeScript parser tests including hang regression coverage.
CoderMind/tests/test_lang_parser_rust.py Adds Rust parser unit/dependency/invocation behavior tests.
CoderMind/tests/test_lang_parser_registry.py Expands registry tests (configs, dominant language, fences, API).
CoderMind/tests/test_lang_parser_python_parity.py Ensures Python parser/ParsedFile parity with legacy AST behavior.
CoderMind/tests/test_lang_parser_javascript.py Adds JavaScript parser tests.
CoderMind/tests/test_lang_parser_go.py Adds Go parser tests (units, deps, invocation extraction).
CoderMind/tests/test_lang_parser_fallback.py Adds fallback delimiter/syntax scanner regression tests.
CoderMind/tests/test_lang_parser_cpp.py Adds C++ parser tests (units, invokes, syntax errors).
CoderMind/tests/test_lang_parser_c.py Adds C parser tests (includes, invokes, syntax errors).
CoderMind/tests/test_interface_source_dedup.py Tests shared interface-source dedup and serialization wiring.
CoderMind/tests/test_init_codebase_gitignore.py Tests .gitignore update behavior for dev env entries.
CoderMind/tests/test_generated_artifact_hygiene.py Adds end-to-end tests for generated-artifact exclude/install + rejection gates.
CoderMind/tests/test_final_test_repair.py Tests bounded final-test repair loop + smoke failure propagation.
CoderMind/tests/test_feature_build.py Tests feature-build tree promotion + target language preservation.
CoderMind/tests/test_entry_reconciliation.py Adds cross-language entry-point reconciliation protocol coverage.
CoderMind/tests/test_code_gen_multilingual.py Adds broad multilingual codegen prompt/test runner/static check coverage.
CoderMind/tests/test_branch_name_sanitization.py Tests branch name sanitization hardening and shared sanitizer usage.
CoderMind/templates/commands/plan.md Clarifies warning is “not done” in plan --check-only semantics.
CoderMind/scripts/run_batch.py Installs local generated-artifact excludes during batch startup.
CoderMind/scripts/plan_tasks.py Adds interface source dedup in planner prompts + validates interface coverage vs skeleton.
CoderMind/scripts/lang_parser/registry.py Fixes dominant language classification for header-heavy mixed C/C++ repos.
CoderMind/scripts/lang_parser/extractors/fallback.py Hardens string-literal stripping regex against catastrophic backtracking.
CoderMind/scripts/func_design/interfaces_store.py Deduplicates file_code during interfaces serialization.
CoderMind/scripts/func_design/interface_review.py Adds file-entry scaffolding on interface review + uses shared dedup when regenerating file_code.
CoderMind/scripts/func_design/interface_agent.py Improves Python dependency collection (same-file calls, self.method() calls) and saves partial interfaces output.
CoderMind/scripts/design_interfaces.py Adds a periodic heartbeat while long interface design runs.
CoderMind/scripts/decoder_lang/tests/test_unit_kind.py Adds tests for shared unit-kind classification and backend callable/type behavior.
CoderMind/scripts/decoder_lang/tests/test_python_backend.py Expands Python backend/registry protocol invariants and behavior tests.
CoderMind/scripts/decoder_lang/tests/test_phase5_prompt_directive.py Tests language directive preamble behavior (no-op for Python).
CoderMind/scripts/decoder_lang/tests/test_phase3_code_structure.py Tests backend code-structure helpers and cross-language behavior.
CoderMind/scripts/decoder_lang/tests/test_phase2_skeleton.py Tests backend-aware skeleton behavior and identifier validation.
CoderMind/scripts/decoder_lang/tests/test_phase1_propagation.py Tests language propagation/resolution through decoder entry points.
CoderMind/scripts/decoder_lang/tests/test_javascript_backend.py Adds JavaScript backend contract tests.
CoderMind/scripts/decoder_lang/tests/test_c_cpp_backend.py Adds C/C++ backend tests for build + ctest semantics and “compile-only make test” detection.
CoderMind/scripts/decoder_lang/tests/init.py Package marker for decoder_lang tests.
CoderMind/scripts/decoder_lang/cpp_backend.py Skips build/generated dirs in source collection + strengthens no-test/compile-only detection.
CoderMind/scripts/decoder_lang/c_backend.py Skips build/generated dirs in source collection + strengthens no-test/compile-only detection.
CoderMind/scripts/decoder_lang/backend.py Ensures CMake projects are built before ctest in C/C++ verification prep.
CoderMind/scripts/common/generated_artifacts.py Adds shared generated-artifact classifier, local exclude installer, and persisted-change detection.
CoderMind/scripts/common/code_dedup.py Adds shared helpers for deduplicating repeated interface code blocks.
CoderMind/scripts/code_gen/post_verify.py Enforces generated-artifact rejection before and after post-verify test runs.
CoderMind/scripts/code_gen/git_ops.py Rejects merging batch branches containing persisted generated artifacts.
CoderMind/scripts/code_gen/final_validation.py Propagates smoke-test failures into final validation outcome.
CoderMind/scripts/code_gen/batch_prompts.py Adds generated-artifact prompt rules and improves C/C++ verification command construction in prompts.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread CoderMind/scripts/code_gen/batch_prompts.py
Comment thread CoderMind/scripts/common/generated_artifacts.py
HuYaSen added 3 commits June 25, 2026 20:04
Keep shell expansion for C/C++ syntax-check include paths and root-anchor virtualenv artifact directories to avoid false positives.

Also cover the false-positive resume prompt summary interpolation with a regression test.
prune_orphan_interfaces is a deprecated helper with no production
callers; the active design flow uses InterfacesStore.find_orphan_units
and prune_units, and file_code is already deduplicated at the
serialization source. Restore the plain join so the legacy helper no
longer carries duplicate dedup logic, and drop the now-unused
dedup_file_code import.
The re-added follow-up tests encoded pre-refactor behavior that no
longer matches the merged decoder pipeline:

- add_interface for a non-Python project is recorded as an advisory
  manual follow-up rather than silently skipped.
- correct_intra_subtree_file_order reports reason
  "backend_file_dependencies", and Go ordering resolves module
  imports through go.mod.
- Go command-path resolution moved to go_backend.find_existing_entry.

Update the assertions to match, and remove the standalone C++ ctest
backend test, whose behavior is fully covered by the comprehensive
decoder_lang backend suite.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 56 out of 57 changed files in this pull request and generated 3 comments.

Comment thread CoderMind/scripts/common/code_dedup.py
Comment thread CoderMind/scripts/func_design/interface_agent.py
Comment thread CoderMind/scripts/decoder_lang/backend.py Outdated
Second Copilot review pass flagged three correctness issues in the
restored hardening code:

- code_dedup: dedup_code_blocks returned the stripped block, dropping
  leading indentation from indented unit slices and corrupting the
  file_code used as a codegen seed. Dedup on the stripped key but keep
  the block's own indentation (trim only trailing whitespace).
- interface_agent: _analyze_python_invocations walked the whole AST,
  attributing calls inside nested def/class bodies to the enclosing
  caller. Walk only the caller's own scope.
- decoder_lang.backend: cmake_reconfigure now skips the build step when
  configure fails so ctest surfaces the real error instead of running
  against a stale build directory.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 56 out of 57 changed files in this pull request and generated 3 comments.

Comment thread CoderMind/scripts/decoder_lang/c_backend.py Outdated
Comment thread CoderMind/scripts/decoder_lang/cpp_backend.py
Comment thread CoderMind/scripts/code_gen/batch_prompts.py
A second Copilot review pass surfaced three correctness gaps in the
restored C/C++ verification and Python invocation analysis:

- c/cpp backend: the compile-only `make test` guard regex did not
  match version-suffixed compilers (gcc-13, clang-18, g++-13,
  clang++-18, c++-14), so a compile-only run could be reported as a
  passing test run. Allow an optional version suffix.
- batch_prompts: the C/C++ syntax-only find now also prunes
  dist/coverage/.venv/venv/CMakeFiles so generated sources are not
  pulled into the syntax check.
- interface_agent: lambdas are not separate units, so keep their
  bodies in the enclosing caller's scope rather than dropping their
  same-file calls (nested def/class scopes are still excluded).

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 56 out of 57 changed files in this pull request and generated 1 comment.

Comment thread CoderMind/scripts/func_design/interface_review.py
@HuYaSen HuYaSen merged commit 1d4da22 into main Jun 26, 2026
3 of 4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants