This project demonstrates a basic Named Entity Recognition (NER) algorithm using Python and the spacy library. The goal is to identify named entities in text and classify them into predefined categories.
ner_project/
├── data/
│ └── ner_data.txt
├── models/
│ └── ner_model
├── preprocess.py
├── train.py
└── recognize.py
└── README.md
- data/ner_data.txt: Contains the dataset used for training the NER model.
- models/ner_model: Stores the trained NER model.
- preprocess.py: Contains the code for preprocessing the text data.
- train.py: Script for training the NER model.
- recognize.py: Script for recognizing named entities in new text using the trained model.
- README.md: Project documentation.
The dataset (ner_data.txt) contains sentences and their corresponding entity labels in the IOB format. Each line contains a word and its label, separated by a space. Sentences are separated by blank lines.
The preprocess.py file contains functions to preprocess the text data. It reads the dataset and converts it into a format suitable for training with spacy.
The train.py script is used to train the NER model. It performs the following steps:
- Load a blank English model.
- Create the NER pipeline component and add it to the pipeline.
- Add labels to the NER component.
- Load the training data.
- Train the model using the training data.
- Save the trained model to
models/ner_model.
To train the model, run:
python train.pyThe recognize.py script is used to recognize named entities in new text using the trained model. It performs the following steps:
- Load the trained model.
- Process the input text.
- Print the recognized entities and their labels.
To recognize named entities in new text, run:
python recognize.pyThe project requires the following Python libraries:
- spacy
You can install the dependencies using:
pip install spacy# Example usage of the recognize.py script
if __name__ == "__main__":
text = "I love programming in Python. Machine learning is fascinating. Spacy is a useful library."
recognize_entities(text)This project provides a basic implementation of Named Entity Recognition using the spacy library. You can expand it by using more advanced models or preprocessing techniques based on your requirements.