OpenAi has launched a new ability to create photos this week from the Chatgpt call interface. The new feature, called Images in Chatgpt, is available today for all users- including those on the free version. According to the company, the feature allows you to create images directly during a conversation with artificial intelligence, without moving to a separate tool.
The new ability is based on GPT-4O, an omnama model capable of understanding and creating text, image, audio and video. According to Gabriel Go, the head of the OpenAI research field, this is a significant leap compared to the previous generation, especially the model’s ability to maintain consistency between objects, colors and features, which are “Binding”.
For example, most existing images find it difficult to create a picture where a number of objects with different colors and shapes appear without disrupting the instructions. The new system, he says, manages to precisely in creating images with 15-20 different objects, regardless of the details.
One of the most prominent enhancements presented is an improvement in presentation of texts within images-an area where existing tools like Dall-E or Midjourney tend to produce incorrect or meaningless texts. Go explained that the team has spent many months in improving this feature, and today you can get readable and stunts in most pictures, except for very small texts.
The new system operates in a different technique of most of the photo-generators: Instead of creating the whole image at once, it operates authentically, as in writing- that is, to the left and top left. This may be what contributes to improvement and ability to understand complex connections. As part of journalists, photos such as an accurate illustration of Newton’s experiment, posters with without mistakes, were presented, comics with consistent figures and stickers with a transparent background for graphic use.
According to Jackie Shannon, the head of the multi-colt product in Openai, the system “brings with it all the cumulative knowledge of the world”, so when the user asks for a picture of Newton’s experiment, there is no need to explain what it is-the model already knows. Shannon added that even though the system requires more time to produce images than the existing tools, quality and accuracy compensate for the additional stay. “Quality, world knowledge and ability- are worth waiting for a few more seconds,” she said.
At the same time launch, OpenAI representatives were asked about the defense measures embedded in the system, in light of cases such as creating sexual deepfakes or fake photos of public figures in other tools. According to the company, obstructive mechanisms that prevent water marks, pornography or violent and illegal images. Although the images do not include clear visual marking that they have been created by artificial intelligence, they include hidden information (C2PA METADATA) that identifies their origin, and OpenAi reserves internal tools to find the images created.
Finally, the company emphasizes that the photos that are created belong to the user and can be used under the Terms of Use. Shannon concluded: “No perfect system, but we are constantly improving the defense mechanisms. This is just a first step.