activation function

An activation function is a nonlinear mapping applied to a neuron’s weighted sum, enabling neural networks to model complex non-linear relationships rather than just stacked linear transformations.

The choice of activation affects gradient propagation, output range, sparsity of neuron responses, and training stability.

Common hidden-layer functions include ReLU and its sparse, efficient variants, sigmoid and tanh, and newer smooth functions like GELU and SiLU.

For multiclass output, softmax converts logits into a probability distribution. In practice it’s important to avoid issues, such as dead ReLU units and saturation in bounded activations. It’s also important to select activations that suit the task and interact well with normalization or regularization schemes.


By Leodanis Pozo Ramos • Updated Oct. 21, 2025