Amazon Web Services investigates whether Perplexity uses ‘web scrapping’ to train its AI

Amazon Web Services (AWS) has announced that it has begun an investigation into the operation of Perplexity – which uses its servers – to determine whether this company uses the ‘web scraping’ technique to train its Artificial Intelligence (AI) models.

Also known as data scraping, it is a process by which content is collected from web pages using ‘software’ that extracts the HTML code of these sites to filter the information and store it, which is compared to the process automatic copy and paste.

Developer Robb Knight and Wired have recently discovered that AI search startup Perplexity has violated the so-called Robots Exclusion Protocol for certain websites and used this technique to train its AI models.

This Protocol responds to a web standard that consists of placing a plain text file (robots.txt) on a domain to indicate which pages robots and automated crawlers should not access, as explained by said medium.

Based on these allegations, Amazon Web Services has launched an investigation to determine whether Perplexity, which uses AWS to train its AI, is violating the rules and running ‘web scrapping’ on websites that tried to prevent it.

This was confirmed to Wired by an AWS spokesperson, who noted that its terms prohibit its customers from using its services for any illegal activity and that they are responsible for complying with its conditions “and all applicable laws.”

From the ‘startup’ they have indicated that Perplexity “respeta robots.txt” and that the services it controls “do not track in any way that violates AWS’s terms of service,” in the words of spokesperson Sara Platnick.

However, the company explained that its bot will ignore the robots.txt file when a user enters a URL in their query, a “rare” use case. “When a user enters a specific URL, it does not trigger a crawling behavior” but rather “the agent acts on behalf of the user to retrieve the URL. It works the same as if the user were to go to a page, copy the text of the article and then paste it into the system,” it said.

In this regard, Wired has stressed that, according to the spokesperson’s description, it is confirmed that the investigations it has carried out are true and that its ‘chatbot’ ignores robots.txt in certain cases to collect information in an unauthorized manner.

Amazon Web Services investigates whether Perplexity uses ‘web scrapping’ to train its AI

ByEditor

By Editor

Related Post

The Moon’s own time ticks 57 microseconds faster than Earth’s

The ‘first principle’ or the key to Apple’s success, according to Harvard University

Open registration portal for AI Application Product Voting Award 2024

One thought on “Amazon Web Services investigates whether Perplexity uses ‘web scrapping’ to train its AI”

Leave a Reply Cancel reply

You missed

Beats Ruud and flies to the third round. Fognini’s feat at Wimbledon

Inflation control and zero deficit, the axes

Potsdam begins in Sanssouci Park

UNCTAD: $32 trillion world trade in 2024

The Observatorial