NLP (Natural Language Processing) datasets are essential for training and evaluating language models. Here are some popular ones:

🧠 General-purpose Datasets

  • GLUE_Benchmark 📦 - A widely used benchmark for GLUE tasks including sentiment analysis and natural language inference.
    GLUE_Benchmark
  • SQuAD 📖 - Designed for evaluating reading comprehension models.
    SQuAD
  • WikiText 📚 - A dataset for language modeling and text understanding tasks.
    WikiText

🧩 Task-specific Datasets

  • CoNLL-2003 🧾 - Focuses on named entity recognition in news text.
    CoNLL_2003
  • IMDB 📈 - For binary sentiment classification of movie reviews.
    IMDB
  • SNLI 🧩 - A dataset for natural language inference (NLI) tasks.
    SNLI

For more details on dataset formats and usage, visit our documentation page. 📁