Welcome to the documentation for the Natural Language Toolkit (NLTK), a leading platform for building Python programs to work with human language data.

Overview

NLTK is a powerful library for processing textual data. It provides easy-to-use interfaces to over 50 corpora and lexical resources such as WordNet, along with a suite of text processing libraries for classification, tokenization, stemming, tagging, parsing, and semantic reasoning.

Features

  • Corpora and Lexical Resources: Access to a wide range of linguistic datasets and lexical resources.
  • Text Processing Libraries: Functions for tokenization, stemming, tagging, parsing, and semantic reasoning.
  • Interactive Interface: The nltk.interactive module allows for interactive exploration of datasets and models.
  • Extensibility: NLTK is designed to be easy to extend with your own data and algorithms.

Getting Started

If you are new to NLTK, we recommend starting with the quickstart guide. This guide will help you get up and running with NLTK in just a few minutes.

Examples

Here are a few examples of what you can do with NLTK:

  • Tokenization: Splitting text into words, sentences, or other tokens.
  • Part-of-Speech Tagging: Identifying parts of speech for each word in a sentence.
  • Named Entity Recognition: Identifying named entities in text, such as people, places, and organizations.

Tokenization

from nltk.tokenize import word_tokenize

text = "NLTK is a leading platform for building Python programs to work with human language data."
tokens = word_tokenize(text)
print(tokens)

Part-of-Speech Tagging

from nltk import pos_tag

text = "NLTK is a leading platform for building Python programs to work with human language data."
tokens = word_tokenize(text)
tags = pos_tag(tokens)
print(tags)

Named Entity Recognition

from nltk import ne_chunk

text = "NLTK is a leading platform for building Python programs to work with human language data."
tokens = word_tokenize(text)
tags = pos_tag(tokens)
entities = ne_chunk(tags)
print(entities)

Resources

For more information, please visit the following resources:

Community

NLTK has a vibrant community of users and developers. You can get help and share your experiences on the NLTK mailing list and the NLTK GitHub repository.

NLTK Logo