Deep learning has already had a huge impact on computer vision and speech recognition, and it's making inroads in areas as computer-unfriendly as cooking. Now a new startup led by University of Toronto professor Brendan Frey wants to cause similar reverberations in genomic medicine. Deep Genomics plans to identify gene variants and mutations never before observed or studied and find how these link to various diseases. And through this work the company believes it can help usher in a new era of personalized medicine.

Genomic research is hard. Scientists still know relatively little about our genes and how they interrelate. But Frey and others in the field now know enough that they can equip machines to do the heavy lifting. And there's an awful lot of this heavy lifting to do. "Genomics is no longer about small datasets," Frey tells Gizmag. "It's now about very, very large datasets."

For context, the first effort to sequence a full human genome took 13 years – running from 1990 to 2003. There are now many companies working to sequence many genomes at a time. The largest of these is called Illumina. "Illumina," Frey says, "expects to sequence one million genomes in the next year. Each genome contains three billion letters. That's a lot of data."

To store and make sense of all that data, Deep Genomics built Spidex. The name is a portmanteau of "splicing index," which basically means that Spidex is a database containing information about how lots and lots of different genetic variants affect (or are likely to affect) RNA splicing – a crucial step in gene expression that edits genes in different ways so that they can produce different kinds of proteins.

If RNA splicing goes off kilter, the consequences could range from nothing in particular to disease and cancer. Spidex is meant to help us separate the harmless variants from the harmful ones, and to understand how they relate to other genetic processes.

Spidex currently includes predictions to the tune of around 328 million such variants and the knock-on effects they pose for RNA splicing. That number is set to grow as the company applies its deep learning algorithms to classifying and interpreting more data.

Frey also notes that the bulk of those variants are in the "junk DNA" part of the genetic code. This is the bit that scientists had previously written off as irrelevant, even though it forms the vast majority of the genome. "Most medical genetic analysis currently deals only with mutations in what are called 'protein-coding segments' in DNA, or 'exons,' Frey explains. "This makes up only 1.5 percent of the genome."

The rest – the 26 percent composed of the "introns" that are removed during the splicing process, plus other non-coding segments – is now known to be important in regulating the genome's functions. Deep Genomics taps into that 98.5 percent of the genome that hasn't been studied closely for mutations and looks for the disease consequences of any mutations it finds.

The idea is to lay the foundation for computers to one day be in charge of predicting lab experiments and treatments and to guide drug development and personalized medicine – perhaps eventually reaching as low as your local doctor's office. This is really just the beginning, but Frey tells us that Deep Genomics technology is already finding commercial use in medical diagnostics and drug development.

A paper describing the research that led to the Spidex database was published in the journal Science in January.