Logistic regression is a fundamental algorithm in machine learning, widely used for binary classification tasks. Unlike linear regression, which predicts continuous values, logistic regression outputs probabilities to determine the likelihood of an instance belonging to a particular class.
🔍 Core Concepts
Sigmoid Function
The core of logistic regression is the sigmoid function:
$$ \sigma(z) = \frac{1}{1 + e^{-z}} $$
This function maps any real number to a value between 0 and 1, representing the probability of a positive outcome.Log Loss
The algorithm minimizes log loss (cross-entropy) to improve accuracy:
$$ \text{Loss} = -\frac{1}{N} \sum_{i=1}^{N} [y_i \log(\hat{y}_i) + (1 - y_i)\log(1 - \hat{y}_i)] $$
where $ y_i $ is the true label and $ \hat{y}_i $ is the predicted probability.Decision Boundary
A threshold (typically 0.5) separates classes. If the predicted probability exceeds this, the instance is classified as positive.
🧪 Example Use Case
- Dataset: Iris flowers (classification of species)
- Goal: Predict whether a flower is "Setosa" based on petal dimensions
- Code Snippet:
from sklearn.linear_model import LogisticRegression model = LogisticRegression() model.fit(X_train, y_train) predictions = model.predict(X_test)
📚延伸阅读
For a deeper dive into related topics, check out our Linear Regression Tutorial.