diff --git a/docs-website/docs/pipeline-components/preprocessors.mdx b/docs-website/docs/pipeline-components/preprocessors.mdx
index 4da2019dff..89664210b5 100644
--- a/docs-website/docs/pipeline-components/preprocessors.mdx
+++ b/docs-website/docs/pipeline-components/preprocessors.mdx
@@ -25,5 +25,6 @@ Use the PreProcessors to prepare your data normalize white spaces, remove header
| [MarkdownHeaderSplitter](preprocessors/markdownheadersplitter.mdx) | Splits documents at ATX-style Markdown headers (#), with optional secondary splitting. Preserves header hierarchy as metadata. |
| [PresidioDocumentCleaner](preprocessors/presidiodocumentcleaner.mdx) | Replaces PII in Document text with entity type placeholders using Microsoft Presidio. |
| [PresidioTextCleaner](preprocessors/presidiotextcleaner.mdx) | Replaces PII in plain strings — useful for sanitizing user queries before they reach an LLM. |
+| [PythonCodeSplitter](preprocessors/pythoncodesplitter.mdx) | Splits Python source documents into syntax-aware chunks using AST units such as imports, functions, class headers, methods, and statements. |
| [RecursiveSplitter](preprocessors/recursivesplitter.mdx) | Splits text into smaller chunks, it does so by recursively applying a list of separators
to the text, applied in the order they are provided. |
| [TextCleaner](preprocessors/textcleaner.mdx) | Removes regexes, punctuation, and numbers, as well as converts text to lowercase. Useful to clean up text data before evaluation. |
diff --git a/docs-website/docs/pipeline-components/preprocessors/pythoncodesplitter.mdx b/docs-website/docs/pipeline-components/preprocessors/pythoncodesplitter.mdx
new file mode 100644
index 0000000000..f575da661d
--- /dev/null
+++ b/docs-website/docs/pipeline-components/preprocessors/pythoncodesplitter.mdx
@@ -0,0 +1,136 @@
+---
+title: "PythonCodeSplitter"
+id: pythoncodesplitter
+slug: "/pythoncodesplitter"
+description: "Split Python source documents into syntax-aware chunks using Python's AST, with metadata for line ranges, classes, decorators, and docstrings."
+---
+
+# PythonCodeSplitter
+
+`PythonCodeSplitter` splits Python source code documents into syntax-aware chunks. It is designed for Python files and keeps code units such as imports, functions, classes, and methods together where possible.
+
+
+
+| | |
+| --- | --- |
+| **Most common position in a pipeline** | In indexing pipelines after [Converters](../converters.mdx), before [Embedders](../embedders.mdx) or [`DocumentWriter`](../writers/documentwriter.mdx) |
+| **Mandatory run variables** | `documents`: A list of Python source code documents |
+| **Output variables** | `documents`: A list of Python source code documents split into syntax-aware chunks |
+| **API reference** | [PreProcessors](/reference/preprocessors-api) |
+| **GitHub link** | https://github.com/deepset-ai/haystack/blob/main/haystack/components/preprocessors/python_code_splitter.py |
+| **Package name** | `haystack-ai` |
+
+
+
+## Overview
+
+`PythonCodeSplitter` expects each input document's `content` to be valid Python source code. It parses the source with Python's `ast` module and creates ordered split units for:
+
+- Module docstrings
+- Consecutive import blocks
+- Top-level functions
+- Class headers
+- Methods and nested classes
+- Remaining top-level statements
+
+The splitter merges these units in source order toward `max_effective_lines`. Effective lines are calculated from character length with `ceil(len(source) / expected_chars_per_line)`, so long lines count as more than one line.
+
+Functions and methods are kept whole by the primary AST split. If one syntactic unit is larger than `oversized_factor * max_effective_lines`, the splitter falls back to a line-based secondary split using [`DocumentSplitter`](documentsplitter.mdx). This oversized fallback is the only case where chunks can overlap; the primary AST split does not add overlap.
+
+By default, `preserve_class_definition=True`. When a chunk contains class members without the original class header, the splitter prefixes the bare class signature so the chunk still carries the class context.
+
+If `strip_docstrings=True`, function, method, and class docstrings are removed from chunk content and stored in `meta["docstrings"]`. Module docstrings stay in the chunk content because they are their own top-level unit.
+
+Each output document includes the original document's metadata plus:
+
+- `source_id`: ID of the original document
+- `split_id`: Index of the chunk within the original document
+- `start_line` and `end_line`: Source line range for the AST units in the chunk. Oversized secondary chunks keep the originating unit's range.
+- `unit_kinds`: Split units included in the chunk, such as `imports`, `function`, `class_header`, or `method`
+- `include_classes`: Class names included in the chunk, when applicable
+- `decorators`: Decorators found on included functions, methods, or classes, when applicable
+- `docstrings`: Stripped docstrings, when `strip_docstrings=True`
+- `secondary_split`, `secondary_split_index`, and `secondary_split_total`: Metadata for oversized fallback chunks
+
+Documents with `None` content raise `ValueError`, documents with non-string content raise `TypeError`, and invalid Python source raises `SyntaxError`. Empty documents are skipped.
+
+## Configuration
+
+| Parameter | Default | Description |
+| --- | --- | --- |
+| `min_effective_lines` | `20` | Minimum effective lines per chunk. While a chunk is below this value, the splitter keeps merging in the next unit. |
+| `max_effective_lines` | `100` | Target effective lines per chunk. Units are merged greedily toward this value. |
+| `expected_chars_per_line` | `45` | Character count used to estimate effective lines. |
+| `oversized_factor` | `3` | Multiplier that triggers secondary line-based splitting for oversized syntactic units. |
+| `strip_docstrings` | `False` | Moves function, method, and class docstrings from content into metadata. |
+| `preserve_class_definition` | `True` | Prefixes class signatures on chunks that contain class members without the class header. |
+| `secondary_split_overlap` | `5` | Line overlap used only by the oversized secondary split. |
+| `secondary_split_length` | `None` | Line length for the oversized secondary split. Defaults to `max_effective_lines`. |
+
+## Usage
+
+### On its own
+
+```python
+import textwrap
+
+from haystack import Document
+from haystack.components.preprocessors import PythonCodeSplitter
+
+source = textwrap.dedent(
+ '''
+ """Math utilities."""
+ from math import pi
+
+
+ class Circle:
+ """A circle."""
+
+ def __init__(self, radius: float) -> None:
+ self.radius = radius
+
+ def area(self) -> float:
+ return pi * self.radius * self.radius
+ '''
+).lstrip()
+
+splitter = PythonCodeSplitter(
+ min_effective_lines=4,
+ max_effective_lines=12,
+ strip_docstrings=True,
+)
+
+result = splitter.run(
+ documents=[Document(content=source, meta={"file_name": "geometry.py"})],
+)
+
+for chunk in result["documents"]:
+ print(chunk.meta["start_line"], chunk.meta["end_line"], chunk.meta.get("include_classes"))
+```
+
+### In a pipeline
+
+This pipeline converts Python files to documents, splits them with `PythonCodeSplitter`, and writes the chunks to an in-memory document store.
+
+```python
+from pathlib import Path
+
+from haystack import Pipeline
+from haystack.components.converters.txt import TextFileToDocument
+from haystack.components.preprocessors import PythonCodeSplitter
+from haystack.components.writers import DocumentWriter
+from haystack.document_stores.in_memory import InMemoryDocumentStore
+
+document_store = InMemoryDocumentStore()
+
+p = Pipeline()
+p.add_component("converter", TextFileToDocument())
+p.add_component("splitter", PythonCodeSplitter(max_effective_lines=80))
+p.add_component("writer", DocumentWriter(document_store=document_store))
+
+p.connect("converter.documents", "splitter.documents")
+p.connect("splitter.documents", "writer.documents")
+
+files = list(Path("path/to/your/project").glob("**/*.py"))
+p.run({"converter": {"sources": files}})
+```
diff --git a/docs-website/sidebars.js b/docs-website/sidebars.js
index 85841717cd..ca80d62356 100644
--- a/docs-website/sidebars.js
+++ b/docs-website/sidebars.js
@@ -481,6 +481,7 @@ export default {
'pipeline-components/preprocessors/documentsplitter',
'pipeline-components/preprocessors/embeddingbaseddocumentsplitter',
'pipeline-components/preprocessors/hierarchicaldocumentsplitter',
+ 'pipeline-components/preprocessors/pythoncodesplitter',
'pipeline-components/preprocessors/recursivesplitter',
'pipeline-components/preprocessors/textcleaner',
'pipeline-components/preprocessors/presidiodocumentcleaner',