Commit 65ac541
authored
refactor!: vendor the tokenizer stack into lance (#6512)
This PR vendors the tokenizer stack Lance actually uses into a new
`rust/lance-tokenizer` crate and rewires FTS and inverted-index code to
depend on it instead of `tantivy` and `lindera-tantivy`. It keeps the
existing document and query tokenization semantics in-tree, renames the
old FTS document adapter module to `document_tokenizer`, and preserves
upstream license headers on vendored code.1 parent e5ceacb commit 65ac541
37 files changed
Lines changed: 3664 additions & 1041 deletions
File tree
- python
- rust
- lance-index
- benches
- src/scalar
- inverted
- tokenizer
- lance-tokenizer
- src
- stop_word_filter
- lance
- src
- dataset/mem_wal/index
- io/exec
- tests/query
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
23 | 23 | | |
24 | 24 | | |
25 | 25 | | |
| 26 | + | |
26 | 27 | | |
27 | 28 | | |
0 commit comments