docs: add Presidio component docs pages#11165
Conversation
Adds documentation for PresidioDocumentCleaner, PresidioTextCleaner, and PresidioEntityExtractor under the Preprocessors section. Related: deepset-ai/haystack-core-integrations#3063
|
@SyedShahmeerAli12 is attempting to deploy a commit to the deepset Team on Vercel. A member of the Team first needs to authorize it. |
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
|
@bogdankostic I can take this one since I reviewed the integration PR |
…extractors section
SyedShahmeerAli12
left a comment
There was a problem hiding this comment.
Hi @sjrl — all three points are addressed in the latest commit:
- Split the single combined
presidio.mdxinto three separate per-component files (presidiodocumentcleaner.mdx,presidiotextcleaner.mdx,presidioentityextractor.mdx) - Moved
PresidioEntityExtractortoextractors/and updated the import path tohaystack_integrations.components.extractors.presidio - Removed
PresidioEntityExtractorfrompreprocessors.mdxand added it toextractors.mdx
Both the current docs and versioned docs (version-2.28) are updated. Ready for re-review!
|
Please also update the other docs pages for the two cleaner components with the same feedback I provided for the entity extractor |
There was a problem hiding this comment.
@sjrl all comments from the latest review are addressed in the latest commit:
- Added
## Overviewsection explaining what Presidio is, what the extractor does (non-destructive stores PII as metadata rather than modifying text), and when you'd want to use it - Updated the spaCy comment to clarify it's for English and that other languages need a different model
- Moved
## Configurationbefore## Usage - Added Microsoft supported entities link to the
entitiesrow in the config table and removed the standalone sentence at the bottom - Added intro sentence under
## Usage - Moved the python config code block into a
### Using Custom Parameterssubsection under Usage
Both current and versioned (version-2.28) docs are updated. Ready for re-review!
… Installation heading Per sjrl review: removes the separate ## Installation section from all three Presidio component pages and moves the pip install + spaCy download block into the Usage section, right after the intro sentence. Also removes the "Unlike the cleaner components" phrasing from PresidioEntityExtractor's Overview since it's not clear in context on a standalone page. Applied to both current docs and versioned docs (version-2.28).
|
@sjrl addressed both comments from the latest review:
Both current docs and versioned docs (version-2.28) have been updated. |
| }) | ||
| ``` | ||
|
|
||
| ## Configuration |
There was a problem hiding this comment.
Let's follow the same structure as the entity extractor page and put this configuration section right after the overview section
| ) | ||
| ``` | ||
|
|
||
| See [Presidio supported entities](https://microsoft.github.io/presidio/supported_entities/) for the full list of detectable PII types. |
There was a problem hiding this comment.
Forgot to remove this line and also put the link the configuration table. Make sure to do this for the text cleaner as well
| }) | ||
| ``` | ||
|
|
||
| ## Configuration |
There was a problem hiding this comment.
Same comment here put this right after the overview section
| ) | ||
| ``` | ||
|
|
||
| See [Presidio supported entities](https://microsoft.github.io/presidio/supported_entities/) for the full list of detectable PII types. |
There was a problem hiding this comment.
Same comment here make sure to remove this line and add the link into the configuration table
…pported entities link, add Using Custom Parameters subsection Per sjrl review: adds ## Overview section to PresidioDocumentCleaner and PresidioTextCleaner pages explaining what Presidio is and when to use the component. Moves ## Configuration to right after Overview (before Usage), adds supported entities link into the entities config table row (removing standalone sentence at bottom), and moves the custom parameters code block into a ### Using Custom Parameters subsection under Usage. Applied to both current docs and versioned docs (version-2.28).
|
@sjrl addressed all 8 comments added Overview, moved Configuration before Usage, added supported entities link in the config table, and moved custom params into Using Custom Parameters for both cleaner pages. |
sjrl
left a comment
There was a problem hiding this comment.
Thanks for the contribution!
Related Issues
Proposed Changes
preprocessors/presidiodocumentcleaner.mdx— PresidioDocumentCleanerpreprocessors/presidiotextcleaner.mdx— PresidioTextCleanerextractors/presidioentityextractor.mdx— PresidioEntityExtractor (import path:haystack_integrations.components.extractors.presidio)preprocessors.mdxextractors.mdxsidebars.jsandversion-2.28-sidebars.jsonversioned_docs/version-2.28Checklist