This section covers various examples of using the Scikit-learn library in Python for machine learning tasks. Scikit-learn is a powerful tool for data mining and data analysis, providing simple and efficient tools for predictive data analysis.
Common Tasks
- Classification: Categorizing data into predefined classes.
- Regression: Predicting a continuous value.
- Clustering: Grouping data into clusters.
- Dimensionality Reduction: Reducing the number of variables under consideration.
Example: Iris Dataset Classification
The Iris dataset is a classic dataset used for classification tasks. It contains measurements of 150 iris flowers from three different species.
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score
# Load the dataset
iris = load_iris()
X = iris.data
y = iris.target
# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Create a Random Forest classifier
clf = RandomForestClassifier(n_estimators=100)
# Train the classifier
clf.fit(X_train, y_train)
# Make predictions
y_pred = clf.predict(X_test)
# Calculate the accuracy
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy:.2f}")
Further Reading
For more detailed tutorials and examples, check out our Python Machine Learning Tutorials.
Visualizing the Results
To visualize the results, we can use the matplotlib
library to plot the confusion matrix.
import matplotlib.pyplot as plt
from sklearn.metrics import confusion_matrix
# Calculate the confusion matrix
cm = confusion_matrix(y_test, y_pred)
# Plot the confusion matrix
plt.imshow(cm, interpolation='nearest', cmap=plt.cm.Blues)
plt.title('Confusion Matrix')
plt.colorbar()
tick_marks = np.arange(len(iris.target_names))
plt.xticks(tick_marks, iris.target_names, rotation=45)
plt.yticks(tick_marks, iris.target_names)
plt.tight_layout()
plt.ylabel('True label')
plt.xlabel('Predicted label')
plt.show()
Confusion Matrix