autoregressive generation

Autoregressive generation is a sequence modeling approach in which a model produces output one token at a time, and each new token is predicted based on all the tokens that came before it.

In language modeling, these systems are typically trained with next-token prediction under teacher forcing, where the model learns to predict the next token given a ground-truth prefix from real data.

At inference time, common decoding strategies include greedy search, beam search, top-k sampling, and nucleus (top-p) sampling, often combined with a temperature parameter to control how random or deterministic the outputs are.


By Leodanis Pozo Ramos • Updated Nov. 17, 2025