8 May 2025

Reinforcement fine-tuning and reward engineering

by Shyamal Anadkat

Share on:

RFT is still in its early days, but it’s a powerful tool - making it surprisingly easy to create expert models for specific domains with less training data.

In traditional RLHF, the reward function approximates a human preference distribution - ranking completions or trajectories. in agentic workflows, the nature of the task shifts: the model must solve structured, compositional, verifiable problems like tool selection, planning, or executing multistep logic (e.g., function calling, parsing, querying APIs, or coordinating sub-agents). In these cases, verifiability becomes critical - either the tool was called correctly or not, either the database returned the correct result or not.

Agentic workflows are designed to make decisions that are both correct and verifiable. RFT can help by providing explicit rubrics and using code-based or model-based graders to measure functional success, factual accuracy, or policy compliance.

Evals are the foundation here.

An effective eval reveals opportunities where human experts consistently agree but current frontier models struggle - presenting a valuable gap for RFT to close. in the knowledge worker economy, workflows that transform standardized inputs into standardized outputs through defined processing steps are well-suited for conversion into evals.

We should expect our mainline models to keep improving rapidly, potentially reaching a point where in-context learning alone achieves top-tier performance on most tasks. but raw eval scores aren’t everything - efficiency and cost matter too. When inference volumes are high, approaches like distillation or RFT become the most efficient way to compress capability into cheaper, faster models. We’re already seeing that RFT can yield both performance gains and latency improvements by refining task-specific behavior through incremental training.

The road to AGI involves multiple systems and reinforcement fine-tuning appears to be a crucial component. by creating systems that can reason effectively about objective reality, learn efficiently from limited examples, and transfer knowledge across domains, RFT addresses many of the core challenges in developing general intelligence.

The field should witness a convergence of reward engineering, interpretability research, and formal verification, thereby redefining what it means to optimize agents in production-scale deployments. Also agree that reward/rubric engineering will be one of the most important skills for steering RL agents.

tags: Reward Engineering - Evals - OpenAI - AI - RFT