reinforcement learning from human feedback (RLHF)

Reinforcement learning from human feedback (RLHF) is a training technique that aligns a large language model with human preferences by using human comparisons of model outputs to shape the model’s behavior.

A typical RLHF pipeline runs in three stages:

Supervised fine-tuning on human-written prompts and responses to give the base model instruction-following behavior.
Reward model training, where annotators rank pairs of model outputs and a separate model learns to predict those preferences as a scalar reward.
Policy optimization with a reinforcement learning algorithm, typically proximal policy optimization (PPO), that maximizes the reward while a Kullback-Leibler divergence penalty discourages drift from the supervised baseline.

RLHF was popularized by InstructGPT and is used to align assistants such as ChatGPT, Claude, and Gemini. Variants include direct preference optimization (DPO), which skips the explicit reward model, and reinforcement learning from AI feedback (RLAIF), which replaces human annotators with another model. Known limitations include reward hacking, sensitivity to annotator agreement, and bias inherited from the preference data.

Tutorial

Build an LLM RAG Chatbot With LangChain

Large language models (LLMs) have taken the world by storm, demonstrating unprecedented capabilities in natural language tasks. In this step-by-step tutorial, you'll leverage LLMs to build your own retrieval-augmented generation (RAG) chatbot using synthetic data with LangChain and Neo4j.

intermediate ai databases data-science

For additional information on related topics, take a look at the following resources:

How to Use Ollama to Run Large Language Models Locally (Tutorial)
How to Integrate ChatGPT's API With Python Projects (Tutorial)
Use ChatGPT to Learn Python Programming (Tutorial)
Document Your Python Code and Projects With ChatGPT (Tutorial)
Python MarkItDown: Convert Documents Into LLM-Ready Markdown (Tutorial)
First Steps With LangChain (Course)
Build an LLM RAG Chatbot With LangChain (Quiz)
How to Use Ollama to Run Large Language Models Locally (Quiz)
Leverage OpenAI's API in Your Python Projects (Course)
How to Integrate ChatGPT's API With Python Projects (Quiz)
Python MarkItDown: Convert Documents Into LLM-Ready Markdown (Quiz)

By Martin Breuss • Updated July 7, 2026

AI Coding Glossary Share Feedback

reinforcement learning from human feedback (RLHF)

Related Resources

Build an LLM RAG Chatbot With LangChain