Software breakthrough radically boosts the speed of DNA sequencers
A new piece of open source software can radically speed up the process of DNA sequencing, according to researchers at Johns Hopkins University. Using a portable sequencer, the UNCALLED software can reduce a 15-day operation down to three days, or even one.
Full genome sequencing is a torturously long process – but a lot of the time, there are only specific parts of the genome you're interested in. And that's where this new software makes such a huge difference, allowing researchers to sequence only the specific areas of interest in a particular case, throwing the rest out and saving a ton of time.
The process works like this: the DNA sample, perhaps a bit of blood or spit, gets split up from its original chromosomes into smaller and smaller pieces, until there are billions, or even trillions, of fragmented molecules floating around in the sample. This solution is fed into a device called a nanopore sequencer, a small, portable, thousand-dollar peripheral that connects to a PC or laptop.
"The way the sequencers normally work would be for each nanopore – there might be 512 or more nanopores in the device – to grab one of those molecules, read it off completely, then continue with another," Michael C. Schatz, a Bloomberg Distinguished Associate Professor of Computer Science and Biology and senior author of the paper, tells us over a video call. "But what we're trying to do is very quickly work out if a particular molecule is part of the genome we're interested in or not. If it is, we let it go through and read it off completely. If it's not, we stop and eject it.
"There's an electrical process for this, in the Oxford Nanopore sequencers we're working with. DNA is a charged molecule, it's in a charged environment. And the molecule passes through a teeny tiny hole, the nanopore. As that's happening, the specific nucleotides along it will create different electrical forces. The nanopore sequencer reads off those forces and transmits some very raw electrical data back to the computer. Normally there's a very complicated process called base calling, where the raw data is decoded into nucleotide sequences: ACGT, ACGT, ACGT.
"But UNCALLED does a form of that very quickly, and it can decide, oh, this is from a cancer-related gene. Or, oh, this is a disease gene. Or, oh, this is part of the genome I'm not interested in at all. In which case, you can tell the nanopore sequencer to just flip the voltage on a single pore and it'll spit out that molecule. That's what our software does, it lets you pick and choose which molecules get kicked out of the nanopores, so you can use the sequencer's capacity as efficiently and quickly as possible."
Thus, if a doctor or researcher is looking for specific genes associated with particular conditions – certain cancers, autism spectrum disorders, heart disease or whatever other known gene markers you might be looking for, they can effectively tell the nanopore sequencer to throw out any molecule that's not directly relevant, and get the results they need much, much quicker.
In an initial test, the team did a single typical three-day run on a sample looking for 148 specific cancer genes, and brought back as much information in that run as they'd normally get in five subsequent runs. "Often you get enough information even sooner than that," says Schatz. "You don't really need to run it for three full days. We didn't really break down the analysis at this level yet. We may have had enough information at one day or less. I think this will forever change how DNA sequencing is done."
The research was published in Nature Biotechnology.
Source: Johns Hopkins