NLP (Natural Language Processing) datasets are essential for training and evaluating language models. Here are some popular ones:
🧠 General-purpose Datasets
- GLUE_Benchmark 📦 - A widely used benchmark for GLUE tasks including sentiment analysis and natural language inference.
- SQuAD 📖 - Designed for evaluating reading comprehension models.
- WikiText 📚 - A dataset for language modeling and text understanding tasks.
🧩 Task-specific Datasets
- CoNLL-2003 🧾 - Focuses on named entity recognition in news text.
- IMDB 📈 - For binary sentiment classification of movie reviews.
- SNLI 🧩 - A dataset for natural language inference (NLI) tasks.
For more details on dataset formats and usage, visit our documentation page. 📁