The download of the Spotify catalog opened a new debate over who controls the world’s music

An activist group known as Anna’s Archive assured this week that it massively extracted content from Spotify and posted a torrent to download 86 million files. Along with the songs, they claim to have 256 million metadata records, including artist, album and song names in 300 terabytes of information.

Anna’s Archive is what is known as a shadow library (library in the shadows) and defines itself as a colectivo “archivist”. It is the best-known book piracy site in the world and this time it carried out the download using what is known as scraping: an automated extraction of information, a practice that is also systematically used by generative artificial intelligence systems such as ChatGPT to nurture your chatbots. In this case, the procedure was applied to songs.

The conflict over copyright and digital piracy has been going on for decades. Since the late 90s, with the emergence of file-sharing platforms such as Napster, eMule, Kazaa or Aresthe cultural industry faces recurring disputes over the unauthorized circulation of content. One of the most emblematic milestones of that stage was the demand that the band Metallica filed in 2000 against Napster, becoming a paradigmatic case of legal battles between artists, record labels and digital distribution services.

Nowadays, however, the problem is different: it is no longer just about users who want to evade payments, but about megacorporations that use the information to train their models. It happened with Meta, Mark Zuckerberg’s multimillion-dollar company, which used pirated books without paying royalties to power Llama, its artificial intelligence model.

In the world of music, Suno is a platform that allows you to generate complete songs, from the lyrics and melody to the harmony, from a “prompt” from the user. For this reason, making this amount of music available was celebrated by many users who do not want to pay for Spotify, but the case also opened a debate around who is the biggest beneficiary of this download: artificial intelligence companies They feed their models with large volumes of information.

Who benefits most from this download? How do companies like Meta, OpenAI, Google and Amazon actually extract all the information with which they build their models?

“Scraping”, download and how a musical AI is trained

Data mining, a fundamental practice of AI companies. Photo: Shutterstock

He scraping (or “data scraping”) is a technique that consists of automatically extracting large volumes of information from a digital platform, generally without express authorization of the affected service. It is done through programs that simulate a user’s behavior and systematically go through websites or databases to copy content, metadata or complete records. The American media New York Times, as an emblematic case of this era, is in court with OpenAI because Sam Altman’s company uses journalistic articles to train ChatGPT.

In the case of Spotify, this type of procedure can be used to collect not only songs, but also associated information such as titles, artistsplaylists, release dates and other data that is part of your digital infrastructure.

“First we must clarify that most of the leaked material is protected by copyright, which legally restricts its copying, reproduction and use; it is neither trivial nor legally simple to have access to such a quantity and variety of commercial music without licenses or permissions from the rights holders,” he explained to Clarion Hernán Ordiales, engineer, teacher and audio specialist with AI.

The particularity of this case is that “the volume of data involved is unprecedented and, from a technical point of view, it could serve as a basis for training generative music models.” “Models of this type, like the ones used by Suno o Udiowho have already been sued by associations of the recording industry, under suspicion that they have used this type of material illegally, and who are in full negotiations,” continues the specialist, part of the Open Artificial Intelligence Laboratory (LAIA).

Suno, AI song generator. Photo: Suno

The songs serve to be used as a basis for these models. “These types of models They ‘learn’ from examplesextracting patterns of structure, rhythm, harmony and even timbre directly from large volumes of real audio. The greater the number and diversity of examples, the greater the model’s ability to capture and reproduce complex structures. This approach is called in its most basic form, ‘Machine Learning’, and when it is supported by architectures based on neural networks trained with large-scale data, it falls specifically within the Deep Learning”Ordiales continues.

To do this, we work with audio fragments, translated into “tokens“, which would be like small basic units of sound information. “There are different architectures to develop music generation models. They all start from having representative examples of what the model is expected to generate. Transformer-based language models separate texts into ‘tokens’, which are words or parts of words. In the case of music generation, similar processes can be used but not at the level of words, but at the level of audio fragments”complements David Coronel, also from LAIA.

For this reason, in addition to the songs, Anna’s Archive uploaded what is known as “metadata”, that is, a type of labeling with information about the artist, song, album, year of publication and so on. “A fundamental step is labeling. Each audio example should be accompanied by clear descriptions that tell the model what type of music it is: the genre, the instruments, the mood. Without these quality labels, the model would not be able to relate the audio patterns to the instructions that users will give it later,” he continues.

On a technical level, Coronel explains how music is generated by AI from other music: “In diffusion-type models the process consists of “breaking” the examples by adding noise to them, then the model tries to predict what the added noise is like. The great revelation of this type of model is that when the model is able to correctly predict that noise, It is also capable of performing the reverse process: produce clean music from random noise. In short, what the training stage does is process many examples to extract patterns that make musical sense and coherence, and then use those patterns to compose similar pieces,” he closes.

Spotify defends itself, Anna’s Archive fights back: the debate over cultural preservation

Dispute over the Spotify catalog. Photo: Shutterstock

Spotify, which has more than 700 million users around the world, confirmed that it is investigating the incident and assured that it has already taken action against the accounts involved. “We have identified and disabled malicious accounts that participated in illegal scraping activities,” the company said. In a statement, it added that the investigation detected that “a third party collected public metadata and used illicit tactics to circumvent DRM.” [gestión de derechos digitales] and access some audio files on the platform.”

On the other hand, Anna’s Archive, known for offering links to books and texts protected by copyright, defended the initiative as a cultural preservation project. In a blog post, the group stated that the files would represent “99.6% of all music listened to by Spotify users” and that they would be distributed through torrents.

“Of course, Spotify doesn’t have all the music in the world, but it’s a great start,” said the collective, which defines itself as dedicated to “preserving the knowledge and culture of humanity.” And he added: “With your help, humanity’s musical heritage will be protected forever against natural disasters, wars, budget cuts and other catastrophes.”

Anna’s Archive, the preservationist collective that made Spotify music available for download. Photo: Anna’s Archive

Thus, the form of this dispute is a commercial fight, with economic motivations. But the bottom line is deeper: “From the perspective of cultural preservationthe problem of closed platforms is evident. It already happened with Kindle and it happens today with streaming: works that were available disappear from one day to the next and cannot be found anywhere else. When a few private platforms become the only access route to the cultural heritage (books, music, movies), culture is subject to commercial decisions and not to a criterion of preservation or public access,” Carolina Martínez Elebi, a graduate in Communication Sciences and a professor at the UBA, told this medium.

“In this context, many copyright laws seem to have stopped fulfilling their original objective. Instead of encouraging creation and guaranteeing the conservation of works, they function as tools of blocking and persecution that prevent this cultural heritage from circulating through other channels. The action of some activist groups can be read as a provocation: a way of putting on the agenda that, If the culture is locked in closed gardens, it runs the risk of disappearing when the owner decides to close the door.”, closes the specialist, also author of the DHyTecno site.

The case of Spotify and Anna’s Archive thus exposes a tension that runs through the entire contemporary digital economy: who controls access to culture, under what rules and for what purposes. In a scenario dominated by closed platforms such as Spotify, Apple Music y YouTube and artificial intelligence models that swallow terabytes of data, music also becomes a strategic resource.

It is no longer just about not paying for music, but about defining the future of cultural heritage in the digital age. The discussion is no longer just technical: it is cultural.

The download of the Spotify catalog opened a new debate over who controls the world’s music

ByEditor

“Scraping”, download and how a musical AI is trained

Spotify defends itself, Anna’s Archive fights back: the debate over cultural preservation

By Editor

Related Post

Alexa brings Jalisse back (finally) to Sanremo

Young people who consume sugary drinks have more anxiety

Hides malware that uses AI to take control of the cell phone

Leave a Reply Cancel reply

You missed

Trump immediately hits back after rejecting import tariffs: “Imported duties will remain and an extra 10 percent worldwide”

Alexa brings Jalisse back (finally) to Sanremo

René Benko has a heated argument with his former top advisor

Haddad cautiously celebrates Trump’s tariff overturn

The Observatorial