Activation functions are crucial in neural networks as they introduce non-linearity, enabling models to learn complex patterns. Here's a breakdown of common types:

1. ReLU (Rectified Linear Unit)

  • Formula: $ f(x) = \max(0, x) $
  • Use Case: Widely used in hidden layers for deep learning.
  • Pros: Computationally efficient, avoids vanishing gradient.
  • Cons: May cause "dead neurons" if inputs are negative.
ReLU

2. Sigmoid

  • Formula: $ f(x) = \frac{1}{1 + e^{-x}} $
  • Use Case: Common in binary classification outputs.
  • Pros: Outputs probabilities between 0 and 1.
  • Cons: Suffers from vanishing gradients for large inputs.
Sigmoid

3. Tanh (Hyperbolic Tangent)

  • Formula: $ f(x) = \tanh(x) = \frac{e^x - e^{-x}}{e^x + e^{-x}} $
  • Use Case: Preferred for hidden layers in some architectures.
  • Pros: Output ranges from -1 to 1, better gradient flow than Sigmoid.
  • Cons: Still faces vanishing gradient issues.
Tanh

4. Softmax

  • Formula: $ f(x_i) = \frac{e^{x_i}}{\sum_{j} e^{x_j}} $
  • Use Case: Used in multi-class classification outputs.
  • Pros: Converts raw scores into probability distributions.
  • Cons: Computationally intensive for large inputs.
Softmax

5. Leaky ReLU

  • Formula: $ f(x) = \max(0.01x, x) $
  • Use Case: Addresses dead neuron problem in ReLU.
  • Pros: Mitigates vanishing gradients with a small slope for negative inputs.
  • Cons: Requires tuning the negative slope parameter.
Leaky_ReLU

For deeper insights, explore our tutorial on activation functions. 📚