Optimizing Agentic Tool-Calling Accuracy with SFT and DPO on Amazon SageMaker AI

AWS ML Blog explores how combining Supervised Fine-Tuning (SFT) and Direct Preference Optimization (DPO) on Amazon SageMaker AI can significantly improve the tool-calling reliability of Small Language Models (SLMs) for production environments.

The Hook

In a recent post, aws-ml-blog discusses a comprehensive methodology for improving the tool-calling accuracy of Small Language Models (SLMs). The publication focuses on utilizing a combination of Supervised Fine-Tuning (SFT) and Direct Preference Optimization (DPO) hosted on Amazon SageMaker AI.

The Context

As enterprises transition AI agents from experimental pilots to production environments, ensuring reliable tool execution has emerged as a critical bottleneck. The effectiveness of an AI agent depends heavily on its ability to perform accurate tool selection, format parameters precisely, and execute workflow chains without hallucinating non-existent functions. While massive, proprietary Large Language Models (LLMs) often handle these complex routing and execution tasks well out of the box, they can be cost-prohibitive and introduce unacceptable latency for real-time applications. Lightweight Small Language Models (SLMs) offer a highly compelling alternative. However, out-of-the-box SLMs frequently struggle with the rigid syntax and logical reasoning required for consistent tool calling. To make SLMs viable for enterprise agentic workflows, they must be specialized and rigorously optimized.

The Gist

The aws-ml-blog publication outlines how developers can bridge this performance gap by applying a structured, two-step optimization process. First, Supervised Fine-Tuning (SFT) is utilized to establish a baseline of competence. SFT helps the models recognize the specific nuances of tool-oriented language, API commands, and strict formatting constraints. It teaches the model the basic grammar of the tools it needs to use. Following this foundational step, Direct Preference Optimization (DPO) is applied to further align the model outputs with desired target outcomes. DPO uses preference data-essentially teaching the model through examples of correct versus incorrect behavior-which refines the decision-making process without the heavy computational overhead of training a separate reward model. Furthermore, by leveraging Amazon SageMaker AI, engineering teams can execute this entire optimization pipeline efficiently. SageMaker abstracts the underlying infrastructure complexity, allowing developers to focus on data quality and model evaluation rather than managing GPU clusters.

Conclusion

For engineering teams looking to deploy cost-effective, low-latency AI agents at scale, mastering SLM fine-tuning is a highly valuable capability. While the technical brief notes that the original post omits certain specifics-such as the exact dataset formats, the specific SLM architecture used in the example, and quantitative benchmark results comparing the base model to the fine-tuned variants-it still provides a strong conceptual framework for infrastructure-managed optimization. Understanding how to layer SFT and DPO effectively is essential for building reliable, production-ready agents. Read the full post to explore the methodology and consider how these techniques can be applied to your own agentic workflows.

Key Takeaways

AI agent reliability hinges on accurate tool selection, parameter formatting, and workflow execution.
Supervised Fine-Tuning (SFT) equips Small Language Models with the ability to understand tool-specific commands and constraints.
Direct Preference Optimization (DPO) refines model behavior using preference data, bypassing the need for a separate reward model.
Amazon SageMaker AI abstracts the infrastructure complexity required to run SFT and DPO pipelines.
Optimized SLMs present a cost-effective, low-latency alternative to massive proprietary LLMs for production agentic workflows.

Read the original post at aws-ml-blog

Key Takeaways

Sources