transformer architecture
Transformer architecture is a neural network design that models sequence dependencies using self-attention instead of recurrence or convolutions.
A standard transformer stacks encoder and decoder blocks composed of multihead self-attention and positionwise feed-forward layers, wrapped with residual connections and layer normalization.
Transformers can be specialized for different goals, such as encoder-only models for representation and discrimination, decoder-only models for autoregressive generation, and encoder–decoder models for sequence-to-sequence tasks.
By Leodanis Pozo Ramos • Updated Oct. 21, 2025