guardrails

Guardrails are application-level policies and controls designed to constrain how a model or agent behaves, what it may say, which actions or tools it may invoke, and so on.

In practice, guardrails combine mechanisms such as input filtering and prompt hardening, output validation against schemas or policies, content moderation, topic control, tool allow/deny lists, and monitoring and evaluation.

These controls reduce risk but don’t guarantee perfect safety. They must be maintained, evaluated and layered with broader governance and monitoring.


By Leodanis Pozo Ramos • Updated Oct. 24, 2025