deepset-ai · github-actions · Apr 21, 2026 · Apr 21, 2026
diff --git a/docs-website/reference/integrations-api/presidio.md b/docs-website/reference/integrations-api/presidio.md
@@ -6,31 +6,34 @@ slug: "/integrations-presidio"
 ---
 
 
-## haystack_integrations.components.preprocessors.presidio.presidio_document_cleaner
+## haystack_integrations.components.extractors.presidio.presidio_entity_extractor
 
-### PresidioDocumentCleaner
+### PresidioEntityExtractor
 
-Anonymizes PII in Haystack Documents using [Microsoft Presidio](https://microsoft.github.io/presidio/).
+Detects PII entities in Haystack Documents using Microsoft Presidio Analyzer.
 
-Accepts a list of Documents, detects personally identifiable information (PII) in their
-text content, and returns new Documents with PII replaced by entity type placeholders
-(e.g. `<PERSON>`, `<EMAIL_ADDRESS>`). Original Documents are not mutated.
+See [Presidio Analyzer](https://microsoft.github.io/presidio/) for details.
 
-Documents without text content are passed through unchanged.
+Accepts a list of Documents and returns new Documents with detected PII entities stored
+in each Document's metadata under the key `"entities"`. Each entry in the list contains
+the entity type, start/end character offsets, and the confidence score.
 
-The analyzer and anonymizer engines are loaded on the first call to `run()`,
+Original Documents are not mutated. Documents without text content are passed through unchanged.
+
+The analyzer engine is loaded on the first call to `run()`,
 or by calling `warm_up()` explicitly beforehand.
 
 ### Usage example
 
 ```python
 from haystack import Document
-from haystack_integrations.components.preprocessors.presidio import PresidioDocumentCleaner
+from haystack_integrations.components.extractors.presidio import PresidioEntityExtractor
 
-cleaner = PresidioDocumentCleaner()
-result = cleaner.run(documents=[Document(content="My name is John and my email is john@example.com")])
-print(result["documents"][0].content)
-# My name is <PERSON> and my email is <EMAIL_ADDRESS>
+extractor = PresidioEntityExtractor()
+result = extractor.run(documents=[Document(content="Contact Alice at alice@example.com")])
+print(result["documents"][0].meta["entities"])
+# [{"entity_type": "PERSON", "start": 8, "end": 13, "score": 0.85},
+#  {"entity_type": "EMAIL_ADDRESS", "start": 17, "end": 34, "score": 1.0}]
 ```
 
 #### __init__
@@ -44,16 +47,16 @@ __init__(
 ) -> None
 ```
 
-Initializes the PresidioDocumentCleaner.
+Initializes the PresidioEntityExtractor.
 
 **Parameters:**
 
 - **language** (<code>str</code>) – Language code for PII detection. Defaults to `"en"`.
   See [Presidio supported languages](https://microsoft.github.io/presidio/analyzer/languages/).
-- **entities** (<code>list\[str\] | None</code>) – List of PII entity types to detect and anonymize (e.g. `["PERSON", "EMAIL_ADDRESS"]`).
-  If `None`, all supported entity types are used.
+- **entities** (<code>list\[str\] | None</code>) – List of PII entity types to detect (e.g. `["PERSON", "EMAIL_ADDRESS"]`).
+  If `None`, all supported entity types are detected.
   See [Presidio supported entities](https://microsoft.github.io/presidio/supported_entities/).
-- **score_threshold** (<code>float</code>) – Minimum confidence score (0-1) for a detected entity to be anonymized. Defaults to `0.35`.
+- **score_threshold** (<code>float</code>) – Minimum confidence score (0-1) for a detected entity to be included. Defaults to `0.35`.
   See [Presidio analyzer documentation](https://microsoft.github.io/presidio/analyzer/).
 
 #### warm_up
@@ -62,7 +65,7 @@ Initializes the PresidioDocumentCleaner.
 warm_up() -> None
 ```
 
-Initializes the Presidio analyzer and anonymizer engines.
+Initializes the Presidio analyzer engine.
 
 This method loads the underlying NLP models. In a Haystack Pipeline,
 this is called automatically before the first run.
@@ -73,44 +76,42 @@ this is called automatically before the first run.
 run(documents: list[Document]) -> dict[str, list[Document]]
 ```
 
-Anonymizes PII in the provided Documents.
+Detects PII entities in the provided Documents.
 
 **Parameters:**
 
-- **documents** (<code>list\[Document\]</code>) – List of Documents whose text content will be anonymized.
+- **documents** (<code>list\[Document\]</code>) – List of Documents to analyze for PII entities.
 
 **Returns:**
 
-- <code>dict\[str, list\[Document\]\]</code> – A dictionary with key `documents` containing the cleaned Documents.
-
-## haystack_integrations.components.preprocessors.presidio.presidio_entity_extractor
+- <code>dict\[str, list\[Document\]\]</code> – A dictionary with key `documents` containing Documents with detected entities
+  stored in metadata under the key `"entities"`.
 
-### PresidioEntityExtractor
+## haystack_integrations.components.preprocessors.presidio.presidio_document_cleaner
 
-Detects PII entities in Haystack Documents using Microsoft Presidio Analyzer.
+### PresidioDocumentCleaner
 
-See [Presidio Analyzer](https://microsoft.github.io/presidio/) for details.
+Anonymizes PII in Haystack Documents using [Microsoft Presidio](https://microsoft.github.io/presidio/).
 
-Accepts a list of Documents and returns new Documents with detected PII entities stored
-in each Document's metadata under the key `"entities"`. Each entry in the list contains
-the entity type, start/end character offsets, and the confidence score.
+Accepts a list of Documents, detects personally identifiable information (PII) in their
+text content, and returns new Documents with PII replaced by entity type placeholders
+(e.g. `<PERSON>`, `<EMAIL_ADDRESS>`). Original Documents are not mutated.
 
-Original Documents are not mutated. Documents without text content are passed through unchanged.
+Documents without text content are passed through unchanged.
 
-The analyzer engine is loaded on the first call to `run()`,
+The analyzer and anonymizer engines are loaded on the first call to `run()`,
 or by calling `warm_up()` explicitly beforehand.
 
 ### Usage example
 
 ```python
 from haystack import Document
-from haystack_integrations.components.preprocessors.presidio import PresidioEntityExtractor
+from haystack_integrations.components.preprocessors.presidio import PresidioDocumentCleaner
 
-extractor = PresidioEntityExtractor()
-result = extractor.run(documents=[Document(content="Contact Alice at alice@example.com")])
-print(result["documents"][0].meta["entities"])
-# [{"entity_type": "PERSON", "start": 8, "end": 13, "score": 0.85},
-#  {"entity_type": "EMAIL_ADDRESS", "start": 17, "end": 34, "score": 1.0}]
+cleaner = PresidioDocumentCleaner()
+result = cleaner.run(documents=[Document(content="My name is John and my email is john@example.com")])
+print(result["documents"][0].content)
+# My name is <PERSON> and my email is <EMAIL_ADDRESS>
 ```
 
 #### __init__
@@ -124,16 +125,16 @@ __init__(
 ) -> None
 ```
 
-Initializes the PresidioEntityExtractor.
+Initializes the PresidioDocumentCleaner.
 
 **Parameters:**
 
 - **language** (<code>str</code>) – Language code for PII detection. Defaults to `"en"`.
   See [Presidio supported languages](https://microsoft.github.io/presidio/analyzer/languages/).
-- **entities** (<code>list\[str\] | None</code>) – List of PII entity types to detect (e.g. `["PERSON", "EMAIL_ADDRESS"]`).
-  If `None`, all supported entity types are detected.
+- **entities** (<code>list\[str\] | None</code>) – List of PII entity types to detect and anonymize (e.g. `["PERSON", "EMAIL_ADDRESS"]`).
+  If `None`, all supported entity types are used.
   See [Presidio supported entities](https://microsoft.github.io/presidio/supported_entities/).
-- **score_threshold** (<code>float</code>) – Minimum confidence score (0-1) for a detected entity to be included. Defaults to `0.35`.
+- **score_threshold** (<code>float</code>) – Minimum confidence score (0-1) for a detected entity to be anonymized. Defaults to `0.35`.
   See [Presidio analyzer documentation](https://microsoft.github.io/presidio/analyzer/).
 
 #### warm_up
@@ -142,7 +143,7 @@ Initializes the PresidioEntityExtractor.
 warm_up() -> None
 ```
 
-Initializes the Presidio analyzer engine.
+Initializes the Presidio analyzer and anonymizer engines.
 
 This method loads the underlying NLP models. In a Haystack Pipeline,
 this is called automatically before the first run.
@@ -153,16 +154,15 @@ this is called automatically before the first run.
 run(documents: list[Document]) -> dict[str, list[Document]]
 ```
 
-Detects PII entities in the provided Documents.
+Anonymizes PII in the provided Documents.
 
 **Parameters:**
 
-- **documents** (<code>list\[Document\]</code>) – List of Documents to analyze for PII entities.
+- **documents** (<code>list\[Document\]</code>) – List of Documents whose text content will be anonymized.
 
 **Returns:**
 
-- <code>dict\[str, list\[Document\]\]</code> – A dictionary with key `documents` containing Documents with detected entities
-  stored in metadata under the key `"entities"`.
+- <code>dict\[str, list\[Document\]\]</code> – A dictionary with key `documents` containing the cleaned Documents.
 
 ## haystack_integrations.components.preprocessors.presidio.presidio_text_cleaner
 

diff --git a/docs-website/reference_versioned_docs/version-2.18/integrations-api/presidio.md b/docs-website/reference_versioned_docs/version-2.18/integrations-api/presidio.md
@@ -6,31 +6,34 @@ slug: "/integrations-presidio"
 ---
 
 
-## haystack_integrations.components.preprocessors.presidio.presidio_document_cleaner
+## haystack_integrations.components.extractors.presidio.presidio_entity_extractor
 
-### PresidioDocumentCleaner
+### PresidioEntityExtractor
 
-Anonymizes PII in Haystack Documents using [Microsoft Presidio](https://microsoft.github.io/presidio/).
+Detects PII entities in Haystack Documents using Microsoft Presidio Analyzer.
 
-Accepts a list of Documents, detects personally identifiable information (PII) in their
-text content, and returns new Documents with PII replaced by entity type placeholders
-(e.g. `<PERSON>`, `<EMAIL_ADDRESS>`). Original Documents are not mutated.
+See [Presidio Analyzer](https://microsoft.github.io/presidio/) for details.
 
-Documents without text content are passed through unchanged.
+Accepts a list of Documents and returns new Documents with detected PII entities stored
+in each Document's metadata under the key `"entities"`. Each entry in the list contains
+the entity type, start/end character offsets, and the confidence score.
 
-The analyzer and anonymizer engines are loaded on the first call to `run()`,
+Original Documents are not mutated. Documents without text content are passed through unchanged.
+
+The analyzer engine is loaded on the first call to `run()`,
 or by calling `warm_up()` explicitly beforehand.
 
 ### Usage example
 
 ```python
 from haystack import Document
-from haystack_integrations.components.preprocessors.presidio import PresidioDocumentCleaner
+from haystack_integrations.components.extractors.presidio import PresidioEntityExtractor
 
-cleaner = PresidioDocumentCleaner()
-result = cleaner.run(documents=[Document(content="My name is John and my email is john@example.com")])
-print(result["documents"][0].content)
-# My name is <PERSON> and my email is <EMAIL_ADDRESS>
+extractor = PresidioEntityExtractor()
+result = extractor.run(documents=[Document(content="Contact Alice at alice@example.com")])
+print(result["documents"][0].meta["entities"])
+# [{"entity_type": "PERSON", "start": 8, "end": 13, "score": 0.85},
+#  {"entity_type": "EMAIL_ADDRESS", "start": 17, "end": 34, "score": 1.0}]
 ```
 
 #### __init__
@@ -44,16 +47,16 @@ __init__(
 ) -> None
 ```
 
-Initializes the PresidioDocumentCleaner.
+Initializes the PresidioEntityExtractor.
 
 **Parameters:**
 
 - **language** (<code>str</code>) – Language code for PII detection. Defaults to `"en"`.
   See [Presidio supported languages](https://microsoft.github.io/presidio/analyzer/languages/).
-- **entities** (<code>list\[str\] | None</code>) – List of PII entity types to detect and anonymize (e.g. `["PERSON", "EMAIL_ADDRESS"]`).
-  If `None`, all supported entity types are used.
+- **entities** (<code>list\[str\] | None</code>) – List of PII entity types to detect (e.g. `["PERSON", "EMAIL_ADDRESS"]`).
+  If `None`, all supported entity types are detected.
   See [Presidio supported entities](https://microsoft.github.io/presidio/supported_entities/).
-- **score_threshold** (<code>float</code>) – Minimum confidence score (0-1) for a detected entity to be anonymized. Defaults to `0.35`.
+- **score_threshold** (<code>float</code>) – Minimum confidence score (0-1) for a detected entity to be included. Defaults to `0.35`.
   See [Presidio analyzer documentation](https://microsoft.github.io/presidio/analyzer/).
 
 #### warm_up
@@ -62,7 +65,7 @@ Initializes the PresidioDocumentCleaner.
 warm_up() -> None
 ```
 
-Initializes the Presidio analyzer and anonymizer engines.
+Initializes the Presidio analyzer engine.
 
 This method loads the underlying NLP models. In a Haystack Pipeline,
 this is called automatically before the first run.
@@ -73,44 +76,42 @@ this is called automatically before the first run.
 run(documents: list[Document]) -> dict[str, list[Document]]
 ```
 
-Anonymizes PII in the provided Documents.
+Detects PII entities in the provided Documents.
 
 **Parameters:**
 
-- **documents** (<code>list\[Document\]</code>) – List of Documents whose text content will be anonymized.
+- **documents** (<code>list\[Document\]</code>) – List of Documents to analyze for PII entities.
 
 **Returns:**
 
-- <code>dict\[str, list\[Document\]\]</code> – A dictionary with key `documents` containing the cleaned Documents.
-
-## haystack_integrations.components.preprocessors.presidio.presidio_entity_extractor
+- <code>dict\[str, list\[Document\]\]</code> – A dictionary with key `documents` containing Documents with detected entities
+  stored in metadata under the key `"entities"`.
 
-### PresidioEntityExtractor
+## haystack_integrations.components.preprocessors.presidio.presidio_document_cleaner
 
-Detects PII entities in Haystack Documents using Microsoft Presidio Analyzer.
+### PresidioDocumentCleaner
 
-See [Presidio Analyzer](https://microsoft.github.io/presidio/) for details.
+Anonymizes PII in Haystack Documents using [Microsoft Presidio](https://microsoft.github.io/presidio/).
 
-Accepts a list of Documents and returns new Documents with detected PII entities stored
-in each Document's metadata under the key `"entities"`. Each entry in the list contains
-the entity type, start/end character offsets, and the confidence score.
+Accepts a list of Documents, detects personally identifiable information (PII) in their
+text content, and returns new Documents with PII replaced by entity type placeholders
+(e.g. `<PERSON>`, `<EMAIL_ADDRESS>`). Original Documents are not mutated.
 
-Original Documents are not mutated. Documents without text content are passed through unchanged.
+Documents without text content are passed through unchanged.
 
-The analyzer engine is loaded on the first call to `run()`,
+The analyzer and anonymizer engines are loaded on the first call to `run()`,
 or by calling `warm_up()` explicitly beforehand.
 
 ### Usage example
 
 ```python
 from haystack import Document
-from haystack_integrations.components.preprocessors.presidio import PresidioEntityExtractor
+from haystack_integrations.components.preprocessors.presidio import PresidioDocumentCleaner
 
-extractor = PresidioEntityExtractor()
-result = extractor.run(documents=[Document(content="Contact Alice at alice@example.com")])
-print(result["documents"][0].meta["entities"])
-# [{"entity_type": "PERSON", "start": 8, "end": 13, "score": 0.85},
-#  {"entity_type": "EMAIL_ADDRESS", "start": 17, "end": 34, "score": 1.0}]
+cleaner = PresidioDocumentCleaner()
+result = cleaner.run(documents=[Document(content="My name is John and my email is john@example.com")])
+print(result["documents"][0].content)
+# My name is <PERSON> and my email is <EMAIL_ADDRESS>
 ```
 
 #### __init__
@@ -124,16 +125,16 @@ __init__(
 ) -> None
 ```
 
-Initializes the PresidioEntityExtractor.
+Initializes the PresidioDocumentCleaner.
 
 **Parameters:**
 
 - **language** (<code>str</code>) – Language code for PII detection. Defaults to `"en"`.
   See [Presidio supported languages](https://microsoft.github.io/presidio/analyzer/languages/).
-- **entities** (<code>list\[str\] | None</code>) – List of PII entity types to detect (e.g. `["PERSON", "EMAIL_ADDRESS"]`).
-  If `None`, all supported entity types are detected.
+- **entities** (<code>list\[str\] | None</code>) – List of PII entity types to detect and anonymize (e.g. `["PERSON", "EMAIL_ADDRESS"]`).
+  If `None`, all supported entity types are used.
   See [Presidio supported entities](https://microsoft.github.io/presidio/supported_entities/).
-- **score_threshold** (<code>float</code>) – Minimum confidence score (0-1) for a detected entity to be included. Defaults to `0.35`.
+- **score_threshold** (<code>float</code>) – Minimum confidence score (0-1) for a detected entity to be anonymized. Defaults to `0.35`.
   See [Presidio analyzer documentation](https://microsoft.github.io/presidio/analyzer/).
 
 #### warm_up
@@ -142,7 +143,7 @@ Initializes the PresidioEntityExtractor.
 warm_up() -> None
 ```
 
-Initializes the Presidio analyzer engine.
+Initializes the Presidio analyzer and anonymizer engines.
 
 This method loads the underlying NLP models. In a Haystack Pipeline,
 this is called automatically before the first run.
@@ -153,16 +154,15 @@ this is called automatically before the first run.
 run(documents: list[Document]) -> dict[str, list[Document]]
 ```
 
-Detects PII entities in the provided Documents.
+Anonymizes PII in the provided Documents.
 
 **Parameters:**
 
-- **documents** (<code>list\[Document\]</code>) – List of Documents to analyze for PII entities.
+- **documents** (<code>list\[Document\]</code>) – List of Documents whose text content will be anonymized.
 
 **Returns:**
 
-- <code>dict\[str, list\[Document\]\]</code> – A dictionary with key `documents` containing Documents with detected entities
-  stored in metadata under the key `"entities"`.
+- <code>dict\[str, list\[Document\]\]</code> – A dictionary with key `documents` containing the cleaned Documents.
 
 ## haystack_integrations.components.preprocessors.presidio.presidio_text_cleaner