transformer
A transformer is a neural network model that processes sequences using self-attention to capture long-range dependencies without recurrence or convolutions.
The standard design stacks attention and feed-forward layers with residual connections, layer normalization, and positional encodings. Common variants include encoder-only models for representation learning, decoder-only models for autoregressive generation, and encoder–decoder models for sequence-to-sequence tasks.
Transformers form the foundation of modern language and vision models because of efficient parallel training, strong scaling behavior, and adaptable decoding for tasks, such as classification, translation, and text or image generation.
By Leodanis Pozo Ramos • Updated Oct. 13, 2025