training
Training is the process of fitting a model’s parameters to data by minimizing or optimizing a carefully chosen objective (loss or surrogate) via gradient-based (or other) optimization, typically using forward and backward passes.
In practice, training workflows include:
- Specifying the objective (loss) and evaluation metrics
- Splitting the data into training, validation, and test sets
- Iterating over mini-batches across multiple epochs or streaming passes
- Updating weights with optimizers
- Applying regularization techniques, such as L1/L2 weight decay, dropout, normalization, data augmentation, early stopping, checkpointing, gradient clipping, and learning rate scheduling
- Optionally using a pretraining or fine-tuning pipeline
- Enhancing efficiency and stability through batching, normalization, mixed precision arithmetic, gradient accumulation, distributed / parallel training, and use of accelerated hardware (GPUs, TPUs)
Modern training often operates in overparameterized regimes, where implicit biases of optimization dynamics and early regularization can influence generalization.
By Leodanis Pozo Ramos • Updated Oct. 15, 2025