The DNA in every cell of your body houses an unfathomable amount of information. Harnessing such storage capabilities for the next generation of digital data storage has been the subject of studies for years, and now a team made up of researchers from Microsoft and the University of Washington has broken a new record, managing to store and retrieve 200 MB of data on strands of DNA.
We're getting better at shrinking the physical size of data storage devices while simultaneously increasing the stoarge capacity, with hundreds of gigabytes of data squeezing onto devices that fit in the palm of a hand. But far more data is produced each year than our current technology will be able to keep up with as the world's total data heads towards an estimated 44 trillion GB by 2020.
Unfortunately, even the best of our current range of devices are only relatively short-term solutions to the problem. Hard drives, and optical storage such as DVDs and Blu-Ray discs, are vulnerable to damage and degradation, with a life expectancy of a few decades at best.
Scientists are increasingly looking to nature's hard drive, DNA, as a potential solution to both the capacity and longevity problems. As our own bodies demonstrate, DNA is an incredibly dense storage medium, potentially squeezing in a mind-boggling 5.5 petabits (125,000 GB) of information per cubic millimeter. By that measure, according to University of Washington professor, Luis Ceze, all 700 exabytes of today's accessible internet would fit into a space the size of a shoebox.
You could then tuck that shoebox away in a vault for thousands of years, and the DNA-stored data would remain intact. As evidenced by fossilized remains of woolly mammoths, which have been found to still contain traces of the animals' genetic code thousands of years after they died out, DNA is incredibly hardy and capable of storing information for millennia under the right conditions.
While we won't be using DNA-based hard drives to store vacation snaps in the near future, this latest project is a leap towards more efficient archival technologies for organizations that deal with huge amounts of data. The Microsoft/University of Washington team were able to store, among other things, the Universal Declaration of Human Rights in over 100 languages, the top 100 books of Project Gutenberg, the Crop Trust's seed database, and a HD music video (OK Go's This Too Shall Pass). The data, 200 MB in total, took up less physical space than the tip of a pencil.
In a world where digital data is commonly measured in gigabytes and terabytes, 200 MB doesn't sound like a whole lot, but previous research has only managed DNA data storage on the scale of kilobytes. In 2012, for instance, Harvard geneticist George Church managed to encode his e-book onto DNA, preserving 700 kB of html text, images and formatting instructions – before making 70 billion copies of it.
The UW and Microsoft team, collaborating with Twist Bioscience, were able to encode the data onto the DNA strands by taking advantage of the similarities between DNA's natural code and the binary language of computer code.
"Interestingly, DNA already has a digital 'flavor,' as it has four bases and molecules that 'stick' to each other in a very programmable way," says Ceze. "So the first step in storing digital data into DNA is to map strings of 1s and 0s into strings of As, Cs, Gs and Ts."
Using Polymerase Chain Reaction techniques, the team assigns "addresses" to the sequences to help them find the desired data later. From there, DNA sequences are chemically manufactured, using a silicon-based DNA synthesis substrate that is able to make several sequences simultaneously. Once complete, the DNA is put in a test tube and dehydrated, where, if kept away from light and heat, it can potentially remain for thousands of years.
Reading the data requires a DNA sequencer, which reads the sequence of As, Cs, Gs and Ts, and algorithms which translate that back into the original digital data. Some of that data can be lost in translation, though, and the researchers applied error correction schemes used in computer memory to overcome that hurdle.
"Despite being reliable, DNA writing and reading have errors, just like hard drives and electronic memories have errors, so we needed to develop error-correcting codes to reliably retrieve data," says Ceze. In doing so, not a single byte of information was lost.
The team is also one of only two in the US that are able to perform "random access" on the data, a process which allows them to identify and retrieve the desired sequences from a large pool of random DNA molecules.
As it stands, the process of writing and reading data onto DNA strands is still a long way off fbefore it will be put to good use storing family snapshots and cat videos, thanks to the equipment required and the associated cost, but research is ongoing.
"There are still many challenges in making DNA storage mainstream," says Ceze. "We will continue to focus on developing an end-to-end system and work with our Microsoft and Twist Bioscience collaborators to reduce the cost and increase the speed of writing and reading DNA."
The team discusses the project in the video below.