Skip to content

Commit a3e64f5

Browse files
committed
Add CLAUDE.md and IMPROVEMENT_PLAN.md project docs
1 parent 0d0929c commit a3e64f5

2 files changed

Lines changed: 341 additions & 0 deletions

File tree

CLAUDE.md

Lines changed: 101 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,101 @@
1+
# CLAUDE.md
2+
3+
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
4+
5+
## Project Overview
6+
7+
TextAttack (v0.3.10) is a Python framework for adversarial attacks, data augmentation, and model training in NLP. It provides a modular system where attacks are composed of four pluggable components: goal functions, constraints, transformations, and search methods. The project is maintained by UVA QData Lab.
8+
9+
## Common Commands
10+
11+
### Installation (dev mode)
12+
```bash
13+
pip install -e .[dev]
14+
```
15+
16+
### Testing
17+
```bash
18+
make test # Run full test suite (pytest --dist=loadfile -n auto)
19+
pytest tests -v # Verbose test run
20+
pytest tests/test_augment_api.py # Run a single test file
21+
pytest --lf # Re-run only last failed tests
22+
```
23+
24+
### Formatting & Linting
25+
```bash
26+
make format # Auto-format with black, isort, docformatter
27+
make lint # Check formatting (black --check, isort --check-only, flake8)
28+
```
29+
30+
### Building Docs
31+
```bash
32+
make docs # Build HTML docs with Sphinx
33+
make docs-auto # Hot-reload docs server on port 8765
34+
```
35+
36+
### CLI Usage
37+
```bash
38+
textattack attack --recipe textfooler --model bert-base-uncased-mr --num-examples 100
39+
textattack augment --input-csv examples.csv --output-csv output.csv --input-column text --recipe embedding
40+
textattack train --model-name-or-path lstm --dataset yelp_polarity --epochs 50
41+
textattack list attack-recipes
42+
textattack peek-dataset --dataset-from-huggingface snli
43+
```
44+
45+
## Architecture
46+
47+
### Core Attack Pipeline (`textattack/attack.py`, `textattack/attacker.py`)
48+
49+
An `Attack` is composed of exactly four components:
50+
1. **GoalFunction** (`textattack/goal_functions/`) - Determines if an attack succeeded. Categories: `classification/` (untargeted, targeted), `text/` (BLEU, translation overlap), `custom/`.
51+
2. **Constraints** (`textattack/constraints/`) - Filter invalid perturbations. Categories: `semantics/` (sentence encoders, word embeddings), `grammaticality/` (POS, language models, grammar tools), `overlap/` (edit distance, BLEU), `pre_transformation/` (restrict search space before transforming).
52+
3. **Transformation** (`textattack/transformations/`) - Generate candidate perturbations. Types: `word_swaps/` (embedding, gradient, homoglyph, WordNet), `word_insertions/`, `word_merges/`, `sentence_transformations/`, `WordDeletion`, `CompositeTransformation`.
53+
4. **SearchMethod** (`textattack/search_methods/`) - Traverse the perturbation space. Includes: `BeamSearch`, `GreedySearch`, `GreedyWordSwapWIR`, `GeneticAlgorithm`, `ParticleSwarmOptimization`, `DifferentialEvolution`.
54+
55+
The `Attacker` class orchestrates running attacks on datasets with parallel processing, checkpointing, and logging.
56+
57+
### Attack Recipes (`textattack/attack_recipes/`)
58+
59+
Pre-built attack configurations from the literature (e.g., TextFooler, DeepWordBug, BAE, BERT-Attack, CLARE, CheckList, etc.). Each recipe subclasses `AttackRecipe` and implements a `build(model_wrapper)` classmethod that returns a configured `Attack` object. Includes multi-lingual recipes for French, Spanish, and Chinese.
60+
61+
### Key Abstractions
62+
63+
- **`AttackedText`** (`textattack/shared/attacked_text.py`) - Central text representation that maintains both token list and original text with punctuation. Used throughout the pipeline instead of raw strings.
64+
- **`ModelWrapper`** (`textattack/models/wrappers/`) - Abstract interface for models. Implementations for PyTorch, HuggingFace, TensorFlow, sklearn. Models must accept string input and return predictions.
65+
- **`Dataset`** (`textattack/datasets/`) - Iterable of `(input, output)` pairs. Supports HuggingFace datasets and custom files.
66+
- **`Augmenter`** (`textattack/augmentation/`) - Uses transformations and constraints for data augmentation (not adversarial attacks). Built-in recipes: wordnet, embedding, charswap, eda, checklist, clare, back_trans.
67+
- **`PromptAugmentationPipeline`** (`textattack/prompt_augmentation/`) - Augments prompts and generates LLM responses.
68+
- **LLM Wrappers** (`textattack/llms/`) - Wrappers for using LLMs (HuggingFace, ChatGPT) with prompt augmentation.
69+
70+
### CLI Commands (`textattack/commands/`)
71+
72+
Entry point: `textattack/commands/textattack_cli.py`. Each command (attack, augment, train, eval-model, list, peek-dataset, benchmark-recipe, attack-resume) is a subclass of `TextAttackCommand` with `register_subcommand()` and `run()` methods.
73+
74+
### Configuration
75+
76+
- Version tracked in `docs/conf.py` (imported by `setup.py`)
77+
- Cache directory: `~/.cache/textattack/` (override with `TA_CACHE_DIR` env var)
78+
- Formatting: black (line length 88), isort (skip `__init__.py`), flake8 (ignores: E203, E266, E501, W503, D203)
79+
80+
### CI Workflows (`.github/workflows/`)
81+
82+
- `check-formatting.yml` - Runs `make lint` on Python 3.9
83+
- `run-pytest.yml` - Sets up Python 3.8/3.9 (pytest currently skipped in CI)
84+
- `publish-to-pypi.yml` - PyPI publishing
85+
- `make-docs.yml` - Documentation build
86+
- `codeql-analysis.yml` - Security analysis
87+
88+
### Test Structure
89+
90+
Tests are in `tests/` organized by feature:
91+
- `test_command_line/` - CLI command integration tests (attack, augment, train, eval, list, loggers)
92+
- `test_constraints/` - Constraint unit tests
93+
- `test_augment_api.py`, `test_transformations.py`, `test_attacked_text.py`, `test_tokenizers.py`, `test_word_embedding.py`, `test_metric_api.py`, `test_prompt_augmentation.py`
94+
- `test_command_line/update_test_outputs.py` - Script to regenerate expected test outputs
95+
96+
### Adding New Components
97+
98+
- **Attack recipe**: Subclass `AttackRecipe` in `textattack/attack_recipes/`, implement `build(model_wrapper)`, add import to `__init__.py`, add doc reference in `docs/attack_recipes.rst`.
99+
- **Transformation**: Subclass `Transformation` in appropriate subfolder under `textattack/transformations/`.
100+
- **Constraint**: Subclass `Constraint` or `PreTransformationConstraint` in appropriate subfolder under `textattack/constraints/`.
101+
- **Search method**: Subclass `SearchMethod` in `textattack/search_methods/`.

0 commit comments

Comments
 (0)