|
| 1 | +# CLAUDE.md |
| 2 | + |
| 3 | +This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository. |
| 4 | + |
| 5 | +## Project Overview |
| 6 | + |
| 7 | +TextAttack (v0.3.10) is a Python framework for adversarial attacks, data augmentation, and model training in NLP. It provides a modular system where attacks are composed of four pluggable components: goal functions, constraints, transformations, and search methods. The project is maintained by UVA QData Lab. |
| 8 | + |
| 9 | +## Common Commands |
| 10 | + |
| 11 | +### Installation (dev mode) |
| 12 | +```bash |
| 13 | +pip install -e .[dev] |
| 14 | +``` |
| 15 | + |
| 16 | +### Testing |
| 17 | +```bash |
| 18 | +make test # Run full test suite (pytest --dist=loadfile -n auto) |
| 19 | +pytest tests -v # Verbose test run |
| 20 | +pytest tests/test_augment_api.py # Run a single test file |
| 21 | +pytest --lf # Re-run only last failed tests |
| 22 | +``` |
| 23 | + |
| 24 | +### Formatting & Linting |
| 25 | +```bash |
| 26 | +make format # Auto-format with black, isort, docformatter |
| 27 | +make lint # Check formatting (black --check, isort --check-only, flake8) |
| 28 | +``` |
| 29 | + |
| 30 | +### Building Docs |
| 31 | +```bash |
| 32 | +make docs # Build HTML docs with Sphinx |
| 33 | +make docs-auto # Hot-reload docs server on port 8765 |
| 34 | +``` |
| 35 | + |
| 36 | +### CLI Usage |
| 37 | +```bash |
| 38 | +textattack attack --recipe textfooler --model bert-base-uncased-mr --num-examples 100 |
| 39 | +textattack augment --input-csv examples.csv --output-csv output.csv --input-column text --recipe embedding |
| 40 | +textattack train --model-name-or-path lstm --dataset yelp_polarity --epochs 50 |
| 41 | +textattack list attack-recipes |
| 42 | +textattack peek-dataset --dataset-from-huggingface snli |
| 43 | +``` |
| 44 | + |
| 45 | +## Architecture |
| 46 | + |
| 47 | +### Core Attack Pipeline (`textattack/attack.py`, `textattack/attacker.py`) |
| 48 | + |
| 49 | +An `Attack` is composed of exactly four components: |
| 50 | +1. **GoalFunction** (`textattack/goal_functions/`) - Determines if an attack succeeded. Categories: `classification/` (untargeted, targeted), `text/` (BLEU, translation overlap), `custom/`. |
| 51 | +2. **Constraints** (`textattack/constraints/`) - Filter invalid perturbations. Categories: `semantics/` (sentence encoders, word embeddings), `grammaticality/` (POS, language models, grammar tools), `overlap/` (edit distance, BLEU), `pre_transformation/` (restrict search space before transforming). |
| 52 | +3. **Transformation** (`textattack/transformations/`) - Generate candidate perturbations. Types: `word_swaps/` (embedding, gradient, homoglyph, WordNet), `word_insertions/`, `word_merges/`, `sentence_transformations/`, `WordDeletion`, `CompositeTransformation`. |
| 53 | +4. **SearchMethod** (`textattack/search_methods/`) - Traverse the perturbation space. Includes: `BeamSearch`, `GreedySearch`, `GreedyWordSwapWIR`, `GeneticAlgorithm`, `ParticleSwarmOptimization`, `DifferentialEvolution`. |
| 54 | + |
| 55 | +The `Attacker` class orchestrates running attacks on datasets with parallel processing, checkpointing, and logging. |
| 56 | + |
| 57 | +### Attack Recipes (`textattack/attack_recipes/`) |
| 58 | + |
| 59 | +Pre-built attack configurations from the literature (e.g., TextFooler, DeepWordBug, BAE, BERT-Attack, CLARE, CheckList, etc.). Each recipe subclasses `AttackRecipe` and implements a `build(model_wrapper)` classmethod that returns a configured `Attack` object. Includes multi-lingual recipes for French, Spanish, and Chinese. |
| 60 | + |
| 61 | +### Key Abstractions |
| 62 | + |
| 63 | +- **`AttackedText`** (`textattack/shared/attacked_text.py`) - Central text representation that maintains both token list and original text with punctuation. Used throughout the pipeline instead of raw strings. |
| 64 | +- **`ModelWrapper`** (`textattack/models/wrappers/`) - Abstract interface for models. Implementations for PyTorch, HuggingFace, TensorFlow, sklearn. Models must accept string input and return predictions. |
| 65 | +- **`Dataset`** (`textattack/datasets/`) - Iterable of `(input, output)` pairs. Supports HuggingFace datasets and custom files. |
| 66 | +- **`Augmenter`** (`textattack/augmentation/`) - Uses transformations and constraints for data augmentation (not adversarial attacks). Built-in recipes: wordnet, embedding, charswap, eda, checklist, clare, back_trans. |
| 67 | +- **`PromptAugmentationPipeline`** (`textattack/prompt_augmentation/`) - Augments prompts and generates LLM responses. |
| 68 | +- **LLM Wrappers** (`textattack/llms/`) - Wrappers for using LLMs (HuggingFace, ChatGPT) with prompt augmentation. |
| 69 | + |
| 70 | +### CLI Commands (`textattack/commands/`) |
| 71 | + |
| 72 | +Entry point: `textattack/commands/textattack_cli.py`. Each command (attack, augment, train, eval-model, list, peek-dataset, benchmark-recipe, attack-resume) is a subclass of `TextAttackCommand` with `register_subcommand()` and `run()` methods. |
| 73 | + |
| 74 | +### Configuration |
| 75 | + |
| 76 | +- Version tracked in `docs/conf.py` (imported by `setup.py`) |
| 77 | +- Cache directory: `~/.cache/textattack/` (override with `TA_CACHE_DIR` env var) |
| 78 | +- Formatting: black (line length 88), isort (skip `__init__.py`), flake8 (ignores: E203, E266, E501, W503, D203) |
| 79 | + |
| 80 | +### CI Workflows (`.github/workflows/`) |
| 81 | + |
| 82 | +- `check-formatting.yml` - Runs `make lint` on Python 3.9 |
| 83 | +- `run-pytest.yml` - Sets up Python 3.8/3.9 (pytest currently skipped in CI) |
| 84 | +- `publish-to-pypi.yml` - PyPI publishing |
| 85 | +- `make-docs.yml` - Documentation build |
| 86 | +- `codeql-analysis.yml` - Security analysis |
| 87 | + |
| 88 | +### Test Structure |
| 89 | + |
| 90 | +Tests are in `tests/` organized by feature: |
| 91 | +- `test_command_line/` - CLI command integration tests (attack, augment, train, eval, list, loggers) |
| 92 | +- `test_constraints/` - Constraint unit tests |
| 93 | +- `test_augment_api.py`, `test_transformations.py`, `test_attacked_text.py`, `test_tokenizers.py`, `test_word_embedding.py`, `test_metric_api.py`, `test_prompt_augmentation.py` |
| 94 | +- `test_command_line/update_test_outputs.py` - Script to regenerate expected test outputs |
| 95 | + |
| 96 | +### Adding New Components |
| 97 | + |
| 98 | +- **Attack recipe**: Subclass `AttackRecipe` in `textattack/attack_recipes/`, implement `build(model_wrapper)`, add import to `__init__.py`, add doc reference in `docs/attack_recipes.rst`. |
| 99 | +- **Transformation**: Subclass `Transformation` in appropriate subfolder under `textattack/transformations/`. |
| 100 | +- **Constraint**: Subclass `Constraint` or `PreTransformationConstraint` in appropriate subfolder under `textattack/constraints/`. |
| 101 | +- **Search method**: Subclass `SearchMethod` in `textattack/search_methods/`. |
0 commit comments