Computers

AI system focused on finding overlooked links in millions of scientific studies

A new AI system is designed to help scientists find previously undiscovered connections in the mass of already published research
A new AI system is designed to help scientists find previously undiscovered connections in the mass of already published research

In theage of big data we often seem to be drowning in a constant torrent ofresearch and information. The massive challenge we now face is how tosort through all the work that has been produced. In an excitingcollaboration between computer scientists and cancer researchers atthe University of Cambridge, a novel AI system has been developed tohelp sort through millions of scientific studies and help researchersuncover previously missed connections.

Science, by its very nature, is a piecemeal process. Each tiny newdiscovery or development adds to our greater body of knowledge, butwe are now reaching a point where there is such a giant volume ofdata available on every research topic, no single human mind can reasonably wade throughit.

"As a cancer researcher, even if you knew what you were lookingfor, there are literally thousands of papers appearing every day,"says Anna Korhonen, one of the developers of the new AI system.

Called LION LBD, the system is initially focusingon cancer research due to the broad volume of research on the topicspanning a number of different scientific fields. The systemincorporates machine learning, natural language processing (NLP) andtext mining methods modeled on a technique called literature-baseddiscovery (LBD).

Originally developed in the 1980s by information scientist DonSwanson, the LBD technique was designed to try to help researchershome in on data in studies that could be useful but otherwiseremained buried as secondary to the study's overall hypothesis.Swanson developed the technique after noticing how broad andfragmented scientific research had become.

"The fragmentation of science into specialities makes it likelythat there exist innumerable pairs of logically related, mutuallyisolated literatures," Swanson wrote in a study demonstrating thepotential of LBD back in 1988.

LBD originally arose as a painstaking manual process but in recentyears it has proven perfect for computerized appropriation, with 21stcentury technology allowing machines to help find connections orpatterns in different studies that humans would have never been ableto detect.

"For example, you may know that a cancer drug affects thebehaviour of a certain pathway, but with LION LBD, you may find thata drug developed for a totally different disease affects the samepathway," explains Korhonen, discussing the potential of the new AIsystem.

At this early stage, the LION LBD system is still relativelylimited. It can only produce connections between two keywords orconcepts, and has been initially programmed using just publiclyavailable PubMed abstracts. However, these limitations promise to improveswiftly as the researchers behind it are making theentire system open source and freely accessible.

The LION LBD system is currently accessible to all through a web portal and the entire software code and API is also free todevelopers keen to collaborate and improve it.

The system is described in a new paper published in the journalBioinformatics.

Source: University of Cambridge

  • Facebook
  • Twitter
  • Flipboard
  • LinkedIn
3 comments
Colt12
Since mankind can't seem to come up with a cure for cancer, AI should be just what we have been waiting for. Compiling all of the known studies that have been documented and generating a solution. I hope it will be this easy.
FabianLamaestra
the artificial intelligence will quickly determined that human beings are trying to kill each other off in the depopulation process since there are most likely many existing cancer cures that are simply not being allowed to be used by the general public, either because they are made too expensive or are simply hidden from view.
ljaques
What an absolutely wonderful and novel idea! I hope it does retrieve all those lost, potentially complementary pairs. It could also bring up connections others hadn't thought of and spawn new research using that newly gained perspective. This is exciting!
And I hope they also apply it to the research for global warming/climate change/tipping point. It could plug all those gaping holes in the computer models which may then be able to give rational data back to us. And it could point out the errors and such in current theories.