temperature
Temperature is a decoding parameter that rescales the model’s logits before softmax, thereby controlling how deterministic or diverse the output is.
Lowering temperature sharpens the distribution, pushing probability mass toward the top tokens and approaching greedy decoding in the limit. Raising temperature flattens the distribution, allowing more exploration and novelty at the cost of coherence or correctness.
Temperature is implemented by dividing logits by the temperature constant before softmax, and it is often used together with top-k or nucleus (top-p) sampling. In practice, one chooses temperature per task: low for factual or precise generation, and higher for creative or open-ended generation.
By Leodanis Pozo Ramos • Updated Oct. 15, 2025