remove semantic chunker and dependance on langchain-experimental#3514
Merged
Conversation
OpenAPI ChangesNo changes detected Unexpected changes? Ensure your branch is up-to-date with |
Contributor
There was a problem hiding this comment.
Pull request overview
This PR removes the unused “semantic chunking” pathway from the vector search embedding pipeline, along with its deprecated langchain-experimental dependency, simplifying chunking to a single recursive splitter and cleaning up related settings and tests.
Changes:
- Removed
SemanticChunkerusage and the semantic-chunking code path fromvector_search/utils.py. - Deleted semantic-chunking settings/config from
main/settings.pyand updated tests/fixtures accordingly. - Updated Python dependencies/lockfile to drop
langchain-experimentaland add explicitlangchain-*packages needed by current imports.
Reviewed changes
Copilot reviewed 5 out of 6 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
vector_search/utils.py |
Removes semantic chunking branch and updates internal chunking helper signature/callers. |
vector_search/utils_test.py |
Removes semantic chunking test and updates calls to _chunk_documents to match new signature. |
vector_search/conftest.py |
Removes test patching of SemanticChunker (no longer present). |
main/settings.py |
Removes semantic chunking feature flag and configuration settings. |
pyproject.toml |
Drops langchain-experimental and adds explicit langchain-community / langchain-text-splitters dependencies. |
uv.lock |
Lockfile update reflecting dependency changes. |
Comments suppressed due to low confidence (1)
vector_search/utils_test.py:487
- This test still instantiates a
dense_encoder()and mutatesencoder.token_encoding_name, but_chunk_documentsno longer accepts/uses an encoder. These lines are now dead code and can be removed to keep the test focused on chunking/upload behavior.
encoder = dense_encoder()
mock_qdrant = mocker.patch("qdrant_client.QdrantClient")
mocker.patch(
"vector_search.utils.qdrant_client",
return_value=mock_qdrant,
This was referenced Jun 24, 2026
Closed
Closed
Closed
Closed
Merged
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What are the relevant tickets?
Closes https://github.com/mitodl/hq/issues/11945
Description (What does it do?)
This PR removes the Semantic chunking functionality which is currently unused and also depends on the deprecated langchain-experimental library. the SemanticChunker is also not something that appears to be ported to a now supported library.
How can this be tested?
Tests should pass. Also verify nothing else depends or imports from the langchain-experimental library