Skip to content

feat(splitter): add Bash, CMake, and HCL tree-sitter support#1954

Open
nuthalapativarun wants to merge 3 commits into
cocoindex-io:mainfrom
nuthalapativarun:feat/treesitter-bash
Open

feat(splitter): add Bash, CMake, and HCL tree-sitter support#1954
nuthalapativarun wants to merge 3 commits into
cocoindex-io:mainfrom
nuthalapativarun:feat/treesitter-bash

Conversation

@nuthalapativarun
Copy link
Copy Markdown
Contributor

Summary

  • Add syntax-aware (tree-sitter) splitting for Bash (.sh, .bash), CMake (.cmake, .cmake.in), and HCL/Terraform (.hcl, .tf)
  • All three crates (tree-sitter-bash 0.25.1, tree-sitter-cmake 0.7.1, tree-sitter-hcl 1.1.0) target tree-sitter ^0.25, matching the workspace version
  • Add detect_code_language assertions for the new extensions
  • Add RecursiveSplitter round-trip tests for each language

Test plan

  • cargo test -p cocoindex_ops_text
  • uv run pytest python/tests/ops/test_text.py -v
  • uv run ruff format --check .
  • uv run ruff check .

@georgeh0
Copy link
Copy Markdown
Member

Hi @nuthalapativarun ,

Thanks for contributing! Please run precommit checks and fix related issues. Thanks!

@nuthalapativarun
Copy link
Copy Markdown
Contributor Author

Hi @georgeh0, ran all precommit checks locally — cargo fmt, cargo test -p cocoindex_ops_text (33 tests pass), ruff format --check, ruff check, and pytest python/tests/ops/test_text.py (22 tests pass including test_recursive_splitter_with_bash, test_recursive_splitter_with_cmake, and test_recursive_splitter_with_hcl). Everything is clean. Let me know if there's anything else needed!

@georgeh0
Copy link
Copy Markdown
Member

Hi @georgeh0, ran all precommit checks locally — cargo fmt, cargo test -p cocoindex_ops_text (33 tests pass), ruff format --check, ruff check, and pytest python/tests/ops/test_text.py (22 tests pass including test_recursive_splitter_with_bash, test_recursive_splitter_with_cmake, and test_recursive_splitter_with_hcl). Everything is clean. Let me know if there's anything else needed!

There's a precommit failure: https://github.com/cocoindex-io/cocoindex/actions/runs/25593066486/job/75469059311?pr=1954

In particular, Cargo.toml is updated but Cargo.lock isn't updated accordingly. When you run prek run --all-files or maturin develop it should be fixed automatically. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants