Skip to content

remove semantic chunker and dependance on langchain-experimental#3514

Merged
shanbady merged 4 commits into
mainfrom
shanbady/remove-semanticchunker
Jun 23, 2026
Merged

remove semantic chunker and dependance on langchain-experimental#3514
shanbady merged 4 commits into
mainfrom
shanbady/remove-semanticchunker

Conversation

@shanbady

Copy link
Copy Markdown
Contributor

What are the relevant tickets?

Closes https://github.com/mitodl/hq/issues/11945

Description (What does it do?)

This PR removes the Semantic chunking functionality which is currently unused and also depends on the deprecated langchain-experimental library. the SemanticChunker is also not something that appears to be ported to a now supported library.

How can this be tested?

Tests should pass. Also verify nothing else depends or imports from the langchain-experimental library

@github-actions

github-actions Bot commented Jun 23, 2026

Copy link
Copy Markdown

OpenAPI Changes

No changes detected

View full changelog

Unexpected changes? Ensure your branch is up-to-date with main (consider rebasing).

@shanbady shanbady marked this pull request as ready for review June 23, 2026 15:32
Copilot AI review requested due to automatic review settings June 23, 2026 15:32
@shanbady shanbady added the Needs Review An open Pull Request that is ready for review label Jun 23, 2026

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR removes the unused “semantic chunking” pathway from the vector search embedding pipeline, along with its deprecated langchain-experimental dependency, simplifying chunking to a single recursive splitter and cleaning up related settings and tests.

Changes:

  • Removed SemanticChunker usage and the semantic-chunking code path from vector_search/utils.py.
  • Deleted semantic-chunking settings/config from main/settings.py and updated tests/fixtures accordingly.
  • Updated Python dependencies/lockfile to drop langchain-experimental and add explicit langchain-* packages needed by current imports.

Reviewed changes

Copilot reviewed 5 out of 6 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
vector_search/utils.py Removes semantic chunking branch and updates internal chunking helper signature/callers.
vector_search/utils_test.py Removes semantic chunking test and updates calls to _chunk_documents to match new signature.
vector_search/conftest.py Removes test patching of SemanticChunker (no longer present).
main/settings.py Removes semantic chunking feature flag and configuration settings.
pyproject.toml Drops langchain-experimental and adds explicit langchain-community / langchain-text-splitters dependencies.
uv.lock Lockfile update reflecting dependency changes.
Comments suppressed due to low confidence (1)

vector_search/utils_test.py:487

  • This test still instantiates a dense_encoder() and mutates encoder.token_encoding_name, but _chunk_documents no longer accepts/uses an encoder. These lines are now dead code and can be removed to keep the test focused on chunking/upload behavior.
    encoder = dense_encoder()
    mock_qdrant = mocker.patch("qdrant_client.QdrantClient")
    mocker.patch(
        "vector_search.utils.qdrant_client",
        return_value=mock_qdrant,

Comment thread vector_search/conftest.py
@shanbady shanbady changed the title Shanbady/remove semanticchunker remove Semantic Chunking and dependance on langchain-experimental Jun 23, 2026
@shanbady shanbady changed the title remove Semantic Chunking and dependance on langchain-experimental remove semantic chunker and dependance on langchain-experimental Jun 23, 2026

@mbertrand mbertrand left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@mbertrand mbertrand added Waiting on author and removed Needs Review An open Pull Request that is ready for review labels Jun 23, 2026
@shanbady shanbady merged commit b505dce into main Jun 23, 2026
14 checks passed
@shanbady shanbady deleted the shanbady/remove-semanticchunker branch June 23, 2026 20:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants