How can we make sense of genomic data?

We are now at a time when modern technology has made gene sequencing of an individual’s DNA a reality. DNA sequencing of certain sections of a genome is used routinely in forensics now, and a few hundred genes that are important in determining risk of disease, or likelihood of response to a certain medication, are increasingly being used in clinical medicine. But the human genome is enormous, and the result of doing whole-genome sequencing is an ocean of data, but without a “map” to know how to interpret it. How can we make sense of all this information? How can we make use of this to better define issues of health and wellness for an individual? How can we make this empower the goals of Precision Medicine?

What we need is a way to identify patterns in DNA, whether they are known genes that have already been studied or not, and compare them to patterns in clinical findings. This phenotype-to-genotype mapping is something that we are on the threshold of unlocking.

Such a mapping of patterns – both in DNA as well as in clinical data – requires deep-learning algorithms which can act on very large data sets in order to identify meaningful insights. It requires the application of Artificial Intelligence (AI) to healthcare.

AI in medicine has become a frequent topic of discussion. Led by advances in consumer and commodity implementations, with breakthroughs in speech recognition, natural language text interpretation, and image recognition, we can see how this powerful technology can be applied in medicine as well. But, in order for these deep-learning algorithms to “learn” what the data shows, it needs large data stores on which to work. That has been the limitation to date. Health data has been historically fragmented into institution-centered silos, and aggregation across these domains has been problematic.

Access to VA data: a huge step forward

The recent announcement of Flow Health’s cooperative research agreement with the Department of Veterans Affairs (VA) marks a significant milestone. The scope of the VA data is staggering: it contains records of 22 million veterans over 20 years of history. In addition to structured data, there are also 4 billion chart notes that can be combed using Natural Language Processing, and 4.5 billion medical images that power deep learning on images. It also includes DNA data that is part of the Million Veteran Program. In aggregate, there are about 30 petabytes of data that are becoming part of the Flow Health Medical Knowledge Graph.

With this work, we now have a vast store of data which can “teach” deep learning algorithms about patterns in clinical data, patterns in DNA data, and correlations between them.

The result of this work is the organization of medical information into a truly useful Medical Knowledge Graph. This can be thought of as a collection of findings and insights about a specific query that are context-specific. It is similar to Google’s Knowledge Graph, which Google introduced into their search capabilities in 2012. With the Knowledge Graph, semantic-search information gathered from a wide variety of sources provides structured and detailed information about the topic so that the user can see everything relevant in a single view, without having to navigate to other sites and assemble the information by hand.

The Flow Health Medical Knowledge Graph is like that, but in medicine. The context of the search is the individual patient’s situation – the diagnoses, medications, family history, lab findings, clinical course over time, imaging, and genomic data where available. The result of a search can be something like “for this diabetic patient, should I prescribe a statin and prevent a possible cardiac event? Will it make a difference in this specific case?”

Putting DNA data to use at the point of care

With a map between clinical phenotype patterns and DNA, many more individualized recommendations can be made to clinical questions. Sometimes, the clinical data may point to the need to obtain DNA data if it is not available in order to best answer a question – “will this patient respond to this treatment and avoid progression to kidney failure?” or “will this chemotherapy be effective in treating this patient’s cancer?” The tools can help guide intelligent diagnosis, and once the needed data points are fleshed out, can help guide intelligent treatment.

Just like how Google’s Knowledge Graph is a background service, and serves up information that can be viewed in a variety of browsers, apps, and devices, the Flow Health Medical Knowledge Graph is also a background tool that can display its results within systems used by clinicians and patients. It’s akin to the “Intel inside” slogan used by hardware manufacturers, but in the software world. It does not replace Electronic Health Records systems used by clinicians, but furnishes them with powerful guidelines drawn from a vast store of data. Clinicians are the executive decision-makers, but they are powered by tools that put the entirety of medical knowledge in their hands.

We are at the threshold of building that future now.

Leave a comment