Outlier Detection in Python 📊

Outlier detection is a critical step in data preprocessing. Here are common methods and their implementation in Python:

1. Z-Score Method 📈

import numpy as np
from scipy import stats

# Generate sample data
data = np.random.normal(0, 1, 1000)
outliers = np.abs(stats.zscore(data)) > 3

2. IQR Method ✅

Q1 = np.percentile(data, 25)
Q3 = np.percentile(data, 75)
IQR = Q3 - Q1
outliers = (data < (Q1 - 1.5 * IQR)) | (data > (Q3 + 1.5 * IQR))

3. DBSCAN Clustering 🌐

from sklearn.cluster import DBSCAN

dbscan = DBSCAN(eps=0.5, min_samples=5)
clusters = dbscan.fit_predict(data.reshape(-1,1))
outliers = clusters == -1

4. Isolation Forest 🌲

from sklearn.ensemble import IsolationForest

iso_forest = IsolationForest(contamination=0.1)
outliers = iso_forest.fit_predict(data.reshape(-1,1)) == -1

5. PCA-Based Detection 🔄

from sklearn.decomposition import PCA

pca = PCA(n_components=0.95)
transformed = pca.fit_transform(data)
outliers = np.abs(transformed) > 3

6. Visualization Tools 🖼️

Use matplotlib for scatter plots
Apply seaborn for boxplots
Combine with plotly for interactive dashboards

For advanced techniques like autoencoders or isolation trees, check our guide on Advanced Outlier Detection Methods 🔍