Named Entity Recognition (NER) is a subtask of information extraction that identifies named entities mentioned in unstructured text such as proper nouns, such as names, organizations, locations, and expressions.

Key Concepts

  • Named Entities: Entities that have a special significance and can be categorized into predefined classes, such as people, organizations, locations, medical codes, time expressions, quantities, monetary values, percentages, etc.
  • NER Systems: These systems can be rule-based, machine learning-based, or a combination of both. The latter is often preferred for its accuracy and flexibility.

How NER Works

  1. Preprocessing: Text is tokenized, normalized, and lemmatized.
  2. Feature Extraction: Features like word shape, part of speech, and contextual information are extracted.
  3. Modeling: The features are used to train a model, which can be a rule-based system or a machine learning classifier.
  4. Tagging: The model is used to predict the labels for each token in the text.

Examples

  • Input: "Apple Inc. is an American multinational technology company headquartered in Cupertino, California."
  • Output: "Apple Inc. (ORG), American (GPE), multinational (GPE), technology (GPE), company (ORG), Cupertino (GPE), California (GPE)"

Resources

For more information on NER and related topics, check out our NER Basics course.

Useful Tools

  • SpaCy: An open-source natural language processing library with a powerful NER model.
  • Stanford NER: A rule-based NER tool developed by Stanford University.

Further Reading

NER Example