Computers

GitHub buries 21 TB of open source data in an Arctic archive

GitHub buries 21 TB of open source data in an Arctic archive
The Arctic World Archive vault contains backups of information from all over the world, in case of disaster
The Arctic World Archive vault contains backups of information from all over the world, in case of disaster
View 2 Images
The Arctic World Archive vault contains backups of information from all over the world, in case of disaster
1/2
The Arctic World Archive vault contains backups of information from all over the world, in case of disaster
The GitHub data was stored on an archival film reel made of silver halide on polyester and designed by Piql to last for 1,000 years
2/2
The GitHub data was stored on an archival film reel made of silver halide on polyester and designed by Piql to last for 1,000 years

While it might seem like the internet is leaving a detailed record of history, the world’s knowledge is all surprisingly vulnerable to being lost in a disaster. To help keep a backup, GitHub has now archived 21 TB of public open source data and buried it in a vault in the Arctic designed to preserve it for a thousand years.

It’s been said that future civilizations may know more about the ancient Egyptians than they do about our modern culture. That’s because stone carvings are naturally long-lasting, and (language barriers aside) they don’t require any special technology to read. Meanwhile, those same future historians might struggle to glean any useful info out of a long-broken computer. And it’s already happening, as anybody who’s tried to get data off an old floppy disk can attest.

So, companies and organizations are making efforts to preserve the world’s information for future generations, protecting it against disaster – or just the march of technology. One of the largest is known as the Arctic World Archive, containing data from the National Archives of Mexico and Brazil, the Vatican Library, the European Space Agency, and other museums and corporations.

The Arctic World Archive is located in a decommissioned coal mine on an island in the Norwegian archipelago of Svalbard. Thanks to its cool, dry conditions, the area is proving popular for archivists. After all, it’s just down the road from another famous storage system, the Svalbard Global Seed Vault, which is protecting samples of seeds of many important crops in case of disaster.

The GitHub data was stored on an archival film reel made of silver halide on polyester and designed by Piql to last for 1,000 years
The GitHub data was stored on an archival film reel made of silver halide on polyester and designed by Piql to last for 1,000 years

On July 8, 2020, GitHub deposited 21 TB of data into the Archive, beneath 250 m (820 ft) of permafrost. This data drop consisted of a snapshot of all active public repositories on GitHub as of February 2, 2020, encoded in the form of tiny QR codes imprinted on 186 archival film reels.

These specially-designed film reels are developed by a company called Piql. They’re made of silver halides on polyester and, according to simulated aging tests conducted by Piql, this material can last for up to 1,000 years.

GitHub says that the next phase of the project is to develop what they call the Tech Tree. This guide will also be printed on film, but will be readable by sight, to help people recover the data in the future. The company is currently seeking help from its own community to create this document, which will be added to the vault at a later date.

The details of the project can be seen below in a video outlining a previous data drop.

GitHub Arctic Code Vault

Sources: GitHub, Piql, Arctic World Archive

6 comments
6 comments
MeToo
They need to be sure to include a reader for the data because technology will have moved on. (see cassette tape)
Signguy
Who decides what to save, and is it political or....
paul314
This should only be release code, or someone/something in the future is going to go down a lot of dead ends. (I wonder if my code is in that dump -- it's really lousy.)
ljaques
Let's see, 21TB is roughly one nanosecond of world Internet, right? Or all of Woke Hollywood's stuff? Hmm.
ljaques
OK, saving Open Source code may be a more worthy goal.
Brian M
Depends what they mean by open source data - If it really is just technical source code then maybe of interest to future historians as a curiosity but not of any technical use , just as the code used for the Apollo missions is interesting, but of little technical use today.

But if an an archive of science and technical knowledge then great. If storage space is limited then knowledge of what you would learn in the last year or two of school is a great starting point if a reboot of civilisation is required.

For a real survivable archive, perhaps we should be looking off planet!