Outlier detection is a critical step in data preprocessing. Here are common methods and their implementation in Python:

1. Z-Score Method 📈

import numpy as np
from scipy import stats

# Generate sample data
data = np.random.normal(0, 1, 1000)
outliers = np.abs(stats.zscore(data)) > 3
z_score

2. IQR Method ✅

Q1 = np.percentile(data, 25)
Q3 = np.percentile(data, 75)
IQR = Q3 - Q1
outliers = (data < (Q1 - 1.5 * IQR)) | (data > (Q3 + 1.5 * IQR))
iqr

3. DBSCAN Clustering 🌐

from sklearn.cluster import DBSCAN

dbscan = DBSCAN(eps=0.5, min_samples=5)
clusters = dbscan.fit_predict(data.reshape(-1,1))
outliers = clusters == -1
dbscan

4. Isolation Forest 🌲

from sklearn.ensemble import IsolationForest

iso_forest = IsolationForest(contamination=0.1)
outliers = iso_forest.fit_predict(data.reshape(-1,1)) == -1
isolation_forest

5. PCA-Based Detection 🔄

from sklearn.decomposition import PCA

pca = PCA(n_components=0.95)
transformed = pca.fit_transform(data)
outliers = np.abs(transformed) > 3
pca

6. Visualization Tools 🖼️

  • Use matplotlib for scatter plots
  • Apply seaborn for boxplots
  • Combine with plotly for interactive dashboards

For advanced techniques like autoencoders or isolation trees, check our guide on Advanced Outlier Detection Methods 🔍