NER Tutorial: Named Entity Recognition in NLP

Named Entity Recognition (NER) is a subtask of information extraction that identifies named entities mentioned in unstructured text such as proper nouns, such as names, organizations, locations, and expressions.

Key Concepts

Named Entities: Entities that have a special significance and can be categorized into predefined classes, such as people, organizations, locations, medical codes, time expressions, quantities, monetary values, percentages, etc.
NER Systems: These systems can be rule-based, machine learning-based, or a combination of both. The latter is often preferred for its accuracy and flexibility.

How NER Works

Preprocessing: Text is tokenized, normalized, and lemmatized.
Feature Extraction: Features like word shape, part of speech, and contextual information are extracted.
Modeling: The features are used to train a model, which can be a rule-based system or a machine learning classifier.
Tagging: The model is used to predict the labels for each token in the text.

Examples

Input: "Apple Inc. is an American multinational technology company headquartered in Cupertino, California."
Output: "Apple Inc. (ORG), American (GPE), multinational (GPE), technology (GPE), company (ORG), Cupertino (GPE), California (GPE)"

Resources

For more information on NER and related topics, check out our NER Basics course.

Useful Tools

SpaCy: An open-source natural language processing library with a powerful NER model.
Stanford NER: A rule-based NER tool developed by Stanford University.