22
33A unified, extensible framework for text classification built on [ PyTorch] ( https://pytorch.org/ ) and [ PyTorch Lightning] ( https://lightning.ai/docs/pytorch/stable/ ) .
44
5-
6-
75## 🚀 Features
86
97- ** Unified API** : Consistent interface for different classifier wrappers
@@ -114,52 +112,6 @@ classifier.build(X_train, y_train)
114112```
115113
116114
117- ## 🔧 Advanced Usage
118-
119- ### Custom Configuration
120-
121- ``` python
122- from torchTextClassifiers import torchTextClassifiers
123- from torchTextClassifiers.classifiers.fasttext.config import FastTextConfig
124- from torchTextClassifiers.classifiers.fasttext.wrapper import FastTextWrapper
125-
126- # Create custom configuration
127- config = FastTextConfig(
128- embedding_dim = 200 ,
129- sparse = True ,
130- num_tokens = 20000 ,
131- min_count = 3 ,
132- min_n = 2 ,
133- max_n = 8 ,
134- len_word_ngrams = 3 ,
135- num_classes = 5 ,
136- direct_bagging = False , # Custom FastText parameter
137- )
138-
139- # Create classifier with custom config
140- wrapper = FastTextWrapper(config)
141- classifier = torchTextClassifiers(wrapper)
142- ```
143-
144- ### Using Pre-trained Tokenizers
145-
146- ``` python
147- from torchTextClassifiers import build_fasttext_from_tokenizer
148-
149- # Assume you have a pre-trained tokenizer
150- # my_tokenizer = ... (previously trained NGramTokenizer)
151-
152- classifier = build_fasttext_from_tokenizer(
153- tokenizer = my_tokenizer,
154- embedding_dim = 100 ,
155- num_classes = 3 ,
156- sparse = False
157- )
158-
159- # Model and tokenizer are already built, ready for training
160- classifier.train(X_train, y_train, X_val, y_val, ... )
161- ```
162-
163115### Training Customization
164116
165117``` python
@@ -181,67 +133,6 @@ classifier.train(
181133)
182134```
183135
184- ## 📊 API Reference
185-
186- ### Main Classes
187-
188- #### ` torchTextClassifiers `
189- The main classifier class providing a unified interface.
190-
191- ** Key Methods:**
192- - ` build(X_train, y_train) ` : Build text preprocessing and model
193- - ` train(X_train, y_train, X_val, y_val, ...) ` : Train the model
194- - ` predict(X) ` : Make predictions
195- - ` validate(X, Y) ` : Evaluate on test data
196- - ` to_json(filepath) ` : Save configuration
197- - ` from_json(filepath) ` : Load configuration
198-
199- #### ` BaseClassifierWrapper `
200- Base class for all classifier wrappers. Each classifier implementation extends this class.
201-
202- #### ` FastTextWrapper `
203- Wrapper for FastText classifier implementation with tokenization-based preprocessing.
204-
205- ### FastText Specific
206-
207- #### ` create_fasttext(**kwargs) `
208- Convenience function to create FastText classifiers.
209-
210- ** Parameters:**
211- - ` embedding_dim ` : Embedding dimension
212- - ` sparse ` : Use sparse embeddings
213- - ` num_tokens ` : Vocabulary size
214- - ` min_count ` : Minimum token frequency
215- - ` min_n ` , ` max_n ` : Character n-gram range
216- - ` len_word_ngrams ` : Word n-gram length
217- - ` num_classes ` : Number of output classes
218-
219- #### ` build_fasttext_from_tokenizer(tokenizer, **kwargs) `
220- Create FastText classifier from existing tokenizer.
221-
222- ## 🏗️ Architecture
223-
224- The framework follows a wrapper-based architecture:
225-
226- ```
227- torchTextClassifiers/
228- ├── torchTextClassifiers.py # Main classifier interface
229- ├── classifiers/
230- │ ├── base.py # Abstract base wrapper classes
231- │ ├── fasttext/ # FastText implementation
232- │ │ ├── config.py # Configuration
233- │ │ ├── wrapper.py # FastText wrapper (tokenization)
234- │ │ ├── factory.py # Convenience methods
235- │ │ ├── tokenizer.py # N-gram tokenizer
236- │ │ ├── pytorch_model.py # PyTorch model
237- │ │ ├── lightning_module.py # Lightning module
238- │ │ └── dataset.py # Dataset implementation
239- │ └── simple_text_classifier.py # Example TF-IDF wrapper
240- ├── utilities/
241- │ └── checkers.py # Input validation utilities
242- └── factories.py # Convenience factory functions
243- ```
244-
245136## 🔬 Testing
246137
247138Run the test suite:
@@ -257,24 +148,6 @@ uv run pytest --cov=torchTextClassifiers
257148uv run pytest tests/test_torchTextClassifiers.py -v
258149```
259150
260- ## 🤝 Contributing
261-
262- We welcome contributions! See our [ Developer Guide] ( docs/developer_guide.md ) for information on:
263-
264- - Adding new classifier types
265- - Code organization and patterns
266- - Testing requirements
267- - Documentation standards
268-
269- ## 📄 License
270-
271- This project is licensed under the MIT License - see the [ LICENSE] ( LICENSE ) file for details.
272-
273- ## 🙏 Acknowledgments
274-
275- - Built with [ PyTorch] ( https://pytorch.org/ ) and [ PyTorch Lightning] ( https://lightning.ai/ )
276- - Inspired by [ FastText] ( https://fasttext.cc/ ) for efficient text classification
277- - Uses [ uv] ( https://github.com/astral-sh/uv ) for dependency management
278151
279152## 📚 Examples
280153
@@ -285,14 +158,8 @@ See the [examples/](examples/) directory for:
285158- Custom classifier implementation
286159- Advanced training configurations
287160
288- ## 🐛 Support
289161
290- If you encounter any issues:
291162
292- 1 . Check the [ examples] ( examples/ ) for similar use cases
293- 2 . Review the API documentation above
294- 3 . Open an issue on GitHub with:
295- - Python version
296- - Package versions (` uv tree ` or ` pip list ` )
297- - Minimal reproduction code
298- - Error messages/stack traces
163+ ## 📄 License
164+
165+ This project is licensed under the MIT License - see the [ LICENSE] ( LICENSE ) file for details.
0 commit comments