Unlocking the Power of Health Data

In this day and age, with fewer and fewer exceptions, each of us has a digital footprint as a patient. Each of us has an intentional digital footprint, created by each of us directly and by the trusted others with whom we interact in the real world and online — through electronic health records, personal health records, personal tracker data, blogs, tweets, etc.

Each of us also has a trail of additional information that is a by-product of our online existence, our digital exhaust, which is out there to be mined for data. All of these data sources — individually or, more likely, when aggregated with that of others — may turn out to be usable as information, or perhaps even knowledge.

That is the promise of EHRs, and one key policy argument behind the federal government incentive program promoting their adoption: that health data writ large — big data — when properly analyzed will yield medical insights not otherwise accessible to us; that evidence-based medicine will be advanced immeasurably and that the dissemination of best practices will be tremendously accelerated.

One key bottleneck on the information highway to the future is created by the layers of privacy law that regulate the sharing of personal health data — protected health information or PHI in HIPAA-speak.

Google Co-Founder Larry Page in a TED talk promoted the notion that health data should be shared for the common good. “Wouldn’t it be amazing to have anonymous medical records available to all research doctors?,” Page asked. He added that such sharing of health data would save hundreds of thousands of lives.

Yes, Larry, it would be amazing, but many folks out there are concerned that even de-identified (anonymized) data may be re-identified. There are numerous examples of this being done. On the one hand, there are a finite number of examples, perhaps suggesting that the re-identification problem may be fixed if we just put our heads together. On the other hand, the amount of information publicly available online likely more than doubles every day. So even if we did solve the problem today, information that is de-identified under the HIPAA safe harbor rule today could likely be re-identified tomorrow.

The safe harbor rule requires that 18 categories of identifiers be stripped out of a record in order for it to be considered de-identified. Number 18: Anything else that could be used to re-identify a de-identified record. In the world of big data, that’s not a very useful safe harbor.

The other approach to de-identification is statistical de-identification, using a methodology attested to by an expert. Sounds very scientific, but these methodologies may be “cracked” over time.

More significantly, records de-identified using either method become less useful to researchers. The fewer the data points in an individual patient record, the less it can tell us — and the less knowledge we are likely to gain about disease, injury, and their prevention and treatment.

Let me suggest a third path: patient donation of information, de-identified only so much as each individual patient desires, and delivered through a data layer that merges clinical data with patient-generated health data and other parts of the digital exhaust that adds color to each of our data doppelgangers.

Under HIPAA, each patient has the absolute right to have his or her complete electronic health record sent to the patient or to any third party at the direction of the patient. Third party repositories may be architected to permit views of records to researchers, to other patients, to clinicians, to whomever, and patients may instruct such repositories to share only as much as the individual patient desires. The restrictions may be on the populations of readers (researchers, other patients, etc.) or on the identifiers shared with the clinical data (name, age, gender, address, etc.). The data collected and cleaned in this fashion are far richer than a strictly de-identified data set.

At present, in some parts of the country, copies of patient data from multiple sources can be queried and retrieved through connectivity hubs – health information exchanges (HIEs). For the most part, HIEs are used to access information in a reactive way (e.g., after a patient arrives at emergency room B when her records are all at hospital A). Furthermore, HIEs are mostly underfunded (as were their predecessors, RHIOs) and tend to pull data together as the equivalent of a collection of side-by-side documents, rather than as an integrated record.

A universal patient data layer pulls together health records from multiple silos, integrates and normalizes data in real time, enables read/write communications with the source records, and empowers patients to make decisions about sharing of identifiable data (or de-identified data) for research purposes as well as fully-identified data for treatment and other HIPAA-approved purposes.

Such a data layer may also link up the slightly de-identified — or not-at-all de-identified — patient data with that patient’s data exhaust, in order to present an even richer data set; health records cross-referenced with eating, travel, leisure and other habits could yield greater insights than the health data alone.

I have discussed the patient donation of data before, and the first objection I heard was from a data scientist who worried that the volume of patient records collected in this manner would be too small to yield any meaningful insights. While this may be true at first, I believe that over time patients will come to prefer to set their own limits on data sharing rather than be stuck with the one-size-fits-none approach available under HIPAA. In addition, the data made available in this fashion will be more valuable than that available as de-identified data for research precisely because there are more identifiers attached.

In a perfect world, freely sharing personally identifiable health information would not be problematic. In the real world, of course, it is. Revealing information is revealing vulnerability, and vulnerable populations experience discrimination — in health care, in employment, in housing and in other accommodations — despite the laws prohibiting such discrimination. That is why we tend to favor privacy regulation as the counterbalance to more mobile data. These are, in fact, the two faces of the HITECH Act: promoting the proliferation of EHRs while at the same time promulgating protections and stricter controls on the sharing of the information in those EHRs. This is why the White House’s precision medicine initiative was highlighted in tandem with a renewed emphasis on enforcement of patient data privacy and security regulations.

Ultimately, however, the protections prove to be inadequate (witness the innumerable data breaches experiences in the health sector), and the secondary use of the health data in these records is impeded by the privacy rules.

Published surveys on attitudes regarding privacy show that more than 95% of patients who are active social media users would agree to share their records without having them de-identified to HIPAA standards in order to help other patients — even though more than two-thirds of those same patients anticipate that they may suffer negative consequences as a result of the more open sharing of this clinical information. This represents a paradigm shift that is a product of our connected age, and it is a mindset that we should recognize — and use — for the greater good.

Information silos have been blamed for preventable harm in the past. It is clear that silos are still causing harm, and it is equally clear that we have tools available to us that will improve health at both the individual and population levels. Let’s use them.

A version of this post first appeared on the late, great iHealthBeat.

Leave a comment