Big data enables algorithm that predicts risk of developing deadly diseases

August 13, 2018

The algorithm examines millions of places in a single genome to calculate a person's overall genomic risk factor for developing certain diseases

Lauren Solomon, Broad Communications

View 1 Image

1/1

The algorithm examines millions of places in a single genome to calculate a person's overall genomic risk factor for developing certain diseases

Lauren Solomon, Broad Communications

A new kind of genome analysis has been developed that is claimed to be able to combine a large volume of different genetic variants into a single polygenic risk score that can determine how likely a person is to develop a number of common diseases, including coronary artery disease and inflammatory bowel disease. It's hoped the tool can identify people at high risk of developing a disease, even when they don't exhibit warning signs.

It seems that not a day goes by without the discovery of yet another gene that is somewhat responsible for a certain disease hitting the headlines. However, the hurdle in implementing these discoveries into clinical applications is that the majority of diseases are polygenic in nature. This means that it rarely comes down to a single gene significantly increasing a person's risk for developing a certain disease, but is rather a combination of a large number of variants all coalescing to signal a meaningful risk.

A large research team of scientists from the Broad Institute of MIT and Harvard, Massachusetts General Hospital (MGH), and Harvard Medical School, has now developed a computational algorithm that can calculate a single polygenic risk score based on a variety of small genetic variants. This risk score is produced by compiling a massive number of small variants across an individual genome and reducing this mass of data into a single number that represents a person's risk factor for developing certain diseases.

"We've known for long time that there are people out there at high risk for disease based just on their overall genetic variation," explains Sekar Kathiresan, senior author on the new research. "Now, we're able to measure that risk using genomic data in a meaningful way. From a public health perspective, we need to identify these higher-risk segments of the population so we can provide appropriate care."

The research initially focused on developing algorithms for tracking five common but serious diseases: coronary artery disease, atrial fibrillation, type 2 diabetes, inflammatory bowel disease, and breast cancer. Testing coronary artery disease, for example, involved an algorithm studying over 6.6 million different locations in a individual genome. The dataset in the study encompassed more than 400,000 people from the UK Biobank database.

From that dataset the algorithm targeted eight percent of people that were three times more likely to develop coronary artery disease just based on genetic variation. On closer inspection these results meant that less than one percent of people identified with the lowest polygenic risk scores developed coronary artery disease, while 11 percent of people with the top scores developed the disease.

The researchers do note the current system has limitations that will need to be resolved before any kind of broader implementation. The current study was only using UK Biobank data composed mostly of people with European ancestry, so if algorithms were more widely applied they will need to be optimized for other geographical and ethnic groups with much more data.

"Ultimately, this is a new type of genetic risk factor," says Kathiresan. "We envision polygenic risk scores as a way to identify people at high or low risk for a disease, perhaps as early as birth, and then use that information to target interventions – either lifestyle modifications or treatments – to prevent disease. For heart attack, I foresee that each patient will have the opportunity to know his or her polygenic risk number in the near future, similar to way they can know their cholesterol number right now."

While the researchers do suggest this approach should be considered as something potentially implemented into future clinical contexts, there are undoubtedly a huge amount of legal and ethical issues that would need to be resolved before GPs could safely deliver polygenic risk scores to patients. Alongside the general concern often raised with genetic risk scores triggering undue anxiety when patients are delivered these kinds of results, security is also a profoundly important factor. It isn't hard to imagine medical insurers being keen to get their hands on polygenic risk scores so insurance premiums can be personalized based on how likely a person may be to develop certain diseases.

Ultimately this research is a compelling display of how big data can be crunched by modern computing technologies to generate results we could never have dreamed of just one or two decades ago. As these algorithms crunch more and more numbers they will inevitably increase their accuracy, meaning we can better tailor personalized medical treatments to prevent serious health issues.

The new research was published in the journal Nature Genetics.

Source: Broad Institute