Skip to content

Commit e10082e

Browse files
authored
docs: add CLAUDE.md with development guidelines (#1)
- Add overview of package purpose (PII/PHI detection and redaction) - Add quick commands for installation, testing, and usage - Add supported entity types table - Add links to related projects
1 parent e65a504 commit e10082e

1 file changed

Lines changed: 103 additions & 0 deletions

File tree

CLAUDE.md

Lines changed: 103 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,103 @@
1+
# Claude Code Instructions for openadapt-privacy
2+
3+
## Overview
4+
5+
**openadapt-privacy** provides PII/PHI detection and redaction for GUI automation data. It protects sensitive information (emails, phone numbers, SSNs, credit cards, dates) in text, images, and nested dictionaries.
6+
7+
Key responsibilities:
8+
- Detect personally identifiable information using NER models (Presidio)
9+
- Redact PII from text, images, and structured data
10+
- Support custom entity detection and scrubbing rules
11+
- Integrate with openadapt-capture recording pipelines
12+
13+
**Always use PRs, never push directly to main**
14+
15+
## Quick Commands
16+
17+
```bash
18+
# Install with Presidio (recommended)
19+
uv add "openadapt-privacy[presidio]"
20+
21+
# Download spaCy model for NER
22+
python -m spacy download en_core_web_trf
23+
24+
# Run tests
25+
uv run pytest tests/ -v
26+
27+
# Scrub text
28+
uv run python -c "
29+
from openadapt_privacy import PresidioScrubber
30+
scrubber = PresidioScrubber()
31+
text = 'Contact John Smith at john@example.com or 555-123-4567'
32+
print(scrubber.scrub_text(text))
33+
"
34+
35+
# Scrub dictionary
36+
uv run python -c "
37+
from openadapt_privacy import scrub_dict, PresidioScrubber
38+
scrubber = PresidioScrubber()
39+
data = {'name': 'John Smith', 'email': 'john@example.com'}
40+
print(scrub_dict(data, scrubber))
41+
"
42+
```
43+
44+
## Architecture
45+
46+
```
47+
openadapt_privacy/
48+
base.py # Base classes: Scrubber, ScrubbingProvider
49+
config.py # PrivacyConfig dataclass
50+
loaders.py # Recording/Action/Screenshot loaders
51+
pipelines/dicts.py # Dictionary scrubbing utilities
52+
providers/presidio.py # PresidioScrubber (primary implementation)
53+
```
54+
55+
## Key Components
56+
57+
### PresidioScrubber
58+
Primary scrubber using Presidio NER and spaCy:
59+
- Detects: PERSON, EMAIL_ADDRESS, PHONE_NUMBER, US_SSN, CREDIT_CARD, DATE_TIME, LOCATION
60+
- `scrub_text(text)` - Redact PII from text
61+
- `scrub_image(image)` - OCR + redact PII regions
62+
63+
### Text Scrubbing
64+
```python
65+
from openadapt_privacy import PresidioScrubber
66+
scrubber = PresidioScrubber()
67+
text = "Contact John Smith at john@example.com"
68+
# Output: "Contact <PERSON> at <EMAIL_ADDRESS>"
69+
print(scrubber.scrub_text(text))
70+
```
71+
72+
### Dictionary Scrubbing
73+
```python
74+
from openadapt_privacy import scrub_dict, PresidioScrubber
75+
scrubber = PresidioScrubber()
76+
action = {"text": "Email john@example.com", "title": "User John Smith"}
77+
scrubbed = scrub_dict(action, scrubber)
78+
```
79+
80+
## Supported Entity Types
81+
82+
| Entity | Input | Output |
83+
|--------|-------|--------|
84+
| PERSON | John Smith | <PERSON> |
85+
| EMAIL_ADDRESS | john@example.com | <EMAIL_ADDRESS> |
86+
| PHONE_NUMBER | 555-123-4567 | <PHONE_NUMBER> |
87+
| US_SSN | 923-45-6789 | <US_SSN> |
88+
| CREDIT_CARD | 4532-1234-5678-9012 | <CREDIT_CARD> |
89+
| DATE_TIME | 01/15/1985 | <DATE_TIME> |
90+
| LOCATION | Toronto, ON | <LOCATION> |
91+
92+
## Testing
93+
94+
```bash
95+
uv run pytest tests/ -v
96+
```
97+
98+
## Related Projects
99+
100+
- [openadapt-capture](https://github.com/OpenAdaptAI/openadapt-capture) - GUI interaction capture
101+
- [openadapt-ml](https://github.com/OpenAdaptAI/openadapt-ml) - Train models on captures
102+
- [openadapt-viewer](https://github.com/OpenAdaptAI/openadapt-viewer) - Visualization components
103+
- [Presidio](https://github.com/microsoft/presidio) - PII detection and redaction

0 commit comments

Comments
 (0)