Google claims that its Gemini AI only trains with Docs files if they are public |  TECHNOLOGY

Google has assured that it only collects information from files from its services, such as Google Docs or Spreadsheets, to train its Artificial Intelligence (AI) model. Gemini whether these documents are “publicly available” on the Internet, that is, whether they have been shared through social networks or published on websites.

Companies dedicated to developing the most powerful AI models today, such as OpenAI, Meta and Google, increasingly need to collect more data to continue training and improving the capabilities of these technologies.

This has led to the search for new data sources on the Internet, which, in turn, has caused technology companies to risk potential copyright violations. In fact, as recently reported in an article in The New York Times, these companies have been using publicly available data on the Internet to train their AI models.

OpenAI and Google itself would have made use of the videos published on YouTube to train their GPT-4 model, which, as confirmed by the executive director of the Google platform, Neal Mohan, is a practice that goes against of YouTube policies.

In addition to all this, the same media shared that, according to sources familiar with Google’s practices, the company would also have accessed files from Google Docs, Google Sheets, restaurant reviews on Google Maps and other “publicly available” online materials to Learn more and train your AI products. All this, after a change in the terms of use that the technology company introduced last year to allow this type of use of data.

Specifically, in the case of files from services such as Google Docs, the technology company offers several options when it comes to sharing documents. As explained on Google’s support page, one of these alternatives is to enter the email addresses of the users in question in the file sharing option, so that only these people can open the document.

On the other hand, Google also allows you to share the file through a link. In this way, the document is configured as public so that any user who has the link can open it.

However, the technology company has clarified that these types of documents are not necessarily “publicly available” files, so they are not an option for training Google’s AI model and their information is kept private for users who have access.

This was shared by a Google representative in statements to Business Insider, in which he clarified that sharing a document with the “anyone with link” configuration does not mean that this file is public and, therefore, will not be used for training. of AI.

Specifically, as explained, for a document to be considered publicly available and to be considered for training Google’s AI, it must be published on a website or shared through social networks.

That is, for example, a Google Docs document would become public by sharing its link in an X (former Twitter) or Threads post. By publishing it on these platforms, it is easier for web crawlers to find the document and, therefore, it becomes a public file.

However, the Google representative also stressed that if the document is shared through a private channel, such as through a link sent by email, the file remains restricted exclusively to those users who have the link.

By Editor

Leave a Reply