Motivation
Currently the chunking is possible using Layout, Page, Fixed-size and Paragraph strategies with possible overlap. I would suggest an additional strategy focused solely on quality: LLM chunking. A LLM is called to have coherent and relevant chunks, each chunk expressing the same idea or concept or thought.
In all chunking strategies, a LLM can be used to generate additional metadata to improve the reranking by Azure AI Search when semantic search is True. The LLM would populate for each chunk:
- "Title" by summarizing the chunk into one short sentence
- "Keyword" by extracting the main keywords of the chunk
Please note that in order to make the reranking useful, ticket 2093 needs to be implemented.
Tasks
To be filled in by the engineer picking up the issue
Motivation
Currently the chunking is possible using Layout, Page, Fixed-size and Paragraph strategies with possible overlap. I would suggest an additional strategy focused solely on quality: LLM chunking. A LLM is called to have coherent and relevant chunks, each chunk expressing the same idea or concept or thought.
In all chunking strategies, a LLM can be used to generate additional metadata to improve the reranking by Azure AI Search when semantic search is True. The LLM would populate for each chunk:
Please note that in order to make the reranking useful, ticket 2093 needs to be implemented.
Tasks
To be filled in by the engineer picking up the issue