OpenAI has provided developers with new tools with which to customize your models and create artificial intelligence (AI) applications that include real-time conversations with natural speech and incorporate and improve image understanding.
The technology company held its DevDay 2024 developer event this Tuesday in San Francisco (United States), in which it announced new tools for customizing its AI models.
Developers can access a new process model distillation that integrates into the OpenAI platform so they can use the results of larger capacity models, such as o1-preview and GPT-4o, to fine-tune smaller, more cost-effective ones, such as GPT-4o mini.
This process is found in a new ‘suite’ that allows developers to generate data sets for distillation, create and run custom evaluations to measure model performance on specific tasks. Both tools are integrated into OpenAI’s tuning offering.
Developers can also fine-tune GPT-4o with images, in addition to text, with the new vision fine-tuning tool. In this way, they can incorporate image understanding capabilities to offer visual search or object detection functions.
‘Prompt Caching’ is a tool designed to save developers time and cost by caching frequently used context across multiple API calls. It is automatically applied in the latest versions of GPT-4o, GPT-4o mini, o1-preview and o1-mini, and their optimized versions.
“The API caches the longest prefix of a request that has been previously calculated, starting with 1024 tokens and increasing in increments of 128 tokens. If you reuse requests with common prefixes, we will automatically apply the request caching discount without needing that you make any changes to its API integration,” the company explains on the official blog.
A final addition announced at DevDay is the ‘Realtime API’, a resource with which developers can create fast speech-to-speech experiences in their applications. It is currently in a public beta phase, and is similar to ChatGPT’s advanced voice mode, supporting natural conversations with one of six predefined voices.