ByteDance has a ‘bot’ that extracts data from the Internet and does it 25 times faster than OpenAI

ByEditor

Oct 8, 2024

ByteDance It has been extracting data from the Internet for months with a ‘bot’ called Bytespider, an activity it does at a faster speed than the ‘bots’ of other leading companies in the large language models (LLM) market.

Large language models need enormous amounts of data for their training and these can only be found on the Internet, where several ‘bots’ already operate to ‘scrape’ or extract information from websites.

Firms such as Google, Meta, Amazon, OpenAI and Anthropic use their own ‘bots’, but they are not the only ones, since ByteDance also has its own, called Bytespider, which appeared sometime in April, as confirmed by firms specialized in this type of automations Kasada and Dark Visitors to Fortune.

Bytespider has the peculiarity that In a short time he has become very aggressive in data collection, as evident from Kasada’s reports. According to the CEO of this firm, Sam Crowther, extracts data at 25 times faster than GPTbot (OpenAI) and 300 times higher than that of ClaudeBot (Anthropic).

The ByteDance ‘bot, in addition, does not respect the robots.txt line of codewhich media publishers can embed on their website to tell bots not to extract data. It is also not respected by GPTbot and ClaudeBot.

Behind this massive data extraction seems to be the development of a new LLM by ByteDance, a source familiar with the matter shared with Fortune, which would be used to TikTok’s search function, according to another source.

ByteDance has a ‘bot’ that extracts data from the Internet and does it 25 times faster than OpenAI

ByEditor

By Editor

Related Post

Bluesky, the alternative social network to X, gains 1 million users in 24 hours

China shapes its future lunar car for manned missions in 2030 | space race

It is difficult to identify the world’s smallest fish

Leave a Reply Cancel reply

You missed

After the pogrom in Maccabi fans: the Dutch government is in crisis

FC Bayern Munich: CEO Dreesen resigns from DFL office

Bluesky, the alternative social network to X, gains 1 million users in 24 hours

Why is the dollar getting stronger and stronger?

The Observatorial