MADRID, 3 Ago. ` –
The National Library of Spain has saved more than 1,970,000 web domains for 25 days, almost 68 TB of information. The number of ‘.es’ domains since 2016 has increased by 180,000 and the technological infrastructure used has improved its efficiency, considerably reducing the time required to download information, from 92 days in 2016 to 25 days in 2021, according to informed the institution.
Among the functions of the BNE stands out that of preserving documentary heritage on the internet. For this purpose, for the sixth consecutive year, the massive collection of websites belonging to the domain ‘.es’ is carried out as part of the collective memory of Spain.
In total, the National Library of Spain already conserves 87 percent of the ‘.es’ domains. To save the content, we work with an automatic collection software, NetarchiveSuite, which uses 71 spiders that crawl the web and save the content by clicking and downloading the information from the links it finds.
The content is stored in a specific format, known as WARC (Web Archive), which makes it possible to consult the web sites as one would surf the Internet. A download size limit is set for collections for each website to avoid overloading and saturation of the collection system.
The BNE establishes a limit of 150 Megabytes for each domain, so once this limit is reached, the collection stops, continuing with the next domain. This year and with this configuration, 87% of the total domains have been completely saved.
The Archive of the Spanish Web since its creation in 2009, complements the massive collections with a selection of websites that collect, in greater depth and frequency, more than 40,000 websites in any important domain (.com, .net, etc.) for its historical, social or cultural value.
This would not be possible without the support of the Library Cooperation Council, which enables the collaboration of more than 30 web curators from different autonomous communities, who select and incorporate content into the Spanish Web Archive. The last to join the project have been the autonomous city of Ceuta and the Balearic Islands.
To the collections with the longest history such as national politics or the media, there are collections specifically created to address the most current issues such as climate change, feminism or video games.
In this line, content about the Coronavirus pandemic continues to be saved with more than 6,000 websites saved to date. Events such as the elections in Catalonia and Madrid have also been covered; parties of social and vindictive importance such as March 8, International Women’s Day, or LGTBI Pride.
Without this work, many of the content generated massively and continuously on the Internet would be lost forever, making it impossible for citizens and researchers, current and future, to study our society.