GitHub buries 21 TB of open source data in an Arctic archive
While it might seem like the internet is leaving a detailed record of history, the world’s knowledge is all surprisingly vulnerable to being lost in a disaster. To help keep a backup, GitHub has now archived 21 TB of public open source data and buried it in a vault in the Arctic designed to preserve it for a thousand years.
It’s been said that future civilizations may know more about the ancient Egyptians than they do about our modern culture. That’s because stone carvings are naturally long-lasting, and (language barriers aside) they don’t require any special technology to read. Meanwhile, those same future historians might struggle to glean any useful info out of a long-broken computer. And it’s already happening, as anybody who’s tried to get data off an old floppy disk can attest.
So, companies and organizations are making efforts to preserve the world’s information for future generations, protecting it against disaster – or just the march of technology. One of the largest is known as the Arctic World Archive, containing data from the National Archives of Mexico and Brazil, the Vatican Library, the European Space Agency, and other museums and corporations.
The Arctic World Archive is located in a decommissioned coal mine on an island in the Norwegian archipelago of Svalbard. Thanks to its cool, dry conditions, the area is proving popular for archivists. After all, it’s just down the road from another famous storage system, the Svalbard Global Seed Vault, which is protecting samples of seeds of many important crops in case of disaster.
On July 8, 2020, GitHub deposited 21 TB of data into the Archive, beneath 250 m (820 ft) of permafrost. This data drop consisted of a snapshot of all active public repositories on GitHub as of February 2, 2020, encoded in the form of tiny QR codes imprinted on 186 archival film reels.
These specially-designed film reels are developed by a company called Piql. They’re made of silver halides on polyester and, according to simulated aging tests conducted by Piql, this material can last for up to 1,000 years.
GitHub says that the next phase of the project is to develop what they call the Tech Tree. This guide will also be printed on film, but will be readable by sight, to help people recover the data in the future. The company is currently seeking help from its own community to create this document, which will be added to the vault at a later date.
The details of the project can be seen below in a video outlining a previous data drop.