Reality Mining: Big Data and the Future of Risk Factor Surveillance

Health science writer Paul Webster looks at Big Data and how it is transforming public health / risk factor surveillance in Canada.

Big Data Screenshot.jpg

It’s known as “big data”. And it’s generating a lot of buzz in public health surveillance circles. 

Supercharged by the convergence of myriad data streams from mobile devices, biosensors, genomics, electronic health records and population health databases, digital risk factor surveillance data verges on becoming both all-encompassing, and all-accessible. As the big data revolution sweeps through healthcare, big possibilities for epidemiology and public health surveillance are now swinging open.   

Alex Pentland, an MIT data theorist, discussed some of those possibilities at the World Innovation Summit for Health, in Doha, Qatar, late last year.  “By combining fine-grained, ubiquitous monitoring of human behavior with standard medical data and standard genomic data,” Pentland said in a report tabled at the Summit, “we are taking the first steps towards generating a new, holistic understanding of disease and disease processes.” 

Through a process of data sifting he calls "reality mining" which reveals how people behave far more accurately than survey data does, Pentland argues, big data will hugely enhance chronic disease surveillance, as well as treatment.

Eric Topol, Chair of Innovative Medicine at the Scripps Research Institute in La Jolla, California, agrees. “We have this phenomenal data infrastructure, with the bandwidth and the connectivity and cloud computing and mobile devices,” he notes. Put all this technology together with the many new data streams that Pentland describes, and big data opens spectacular possibilities. “Let’s say you define someone genomically along with their family history as very high risk for asthma. We know a lot of kids die of asthma. At the same time, you have sensors to pick up signs that their airways are starting to constrict long before they even have a wheeze.” Topol explain. Help, Topol hardly needs to add, could be on its way before an asthma attack starts. 

Futuristic as this vision of data-driven healthcare seems, localized versions of it can already be seen forming in parts of Canada. Driven by computing advances, data specialists are dusting off reams of disparate and disconnected data, integrating it with new sources of so-called “big data” from genomics, and harnessing it to probe numerous themes – including risk factors for disease.   

In Hamilton, Ontario, for example, a system known as the Integrated Decision Support Business Intelligence Solution (IDS) now links patient information including demographics, clinical characterizations, emergency visits, hospital admissions and discharges. The system can also reach beyond healthcare databases to potentially connect with information from social services, and possibly even law enforcement agencies. 

Currently operated by the Hamilton Niagara Haldimand Brant Local Health Integration Network, this system is being adopted by three further local hospital networks serving about five million people across southern Ontario.  “The data we’re integrating is much more than just different types of hospital information,” explains Wendy Gerrie, who led the system’s development as director of Integrated Decision Support Services at Hamilton Health Sciences Centre. “We’re bringing in-house data, for example, because we’ve identified that as an issue relevant to healthcare.”  As Gerrie enthused at a recent data analytics conference in Toronto, “total patient data capture is emerging.”

Karen Tu, who assembled the Electronic Medical Record Administrative data Linked Database (EMRALD), at the Institute for Clinical and Evaluative Studies (ICES), in Toronto, is similarly enthusiastic. Health surveillance, she believes, is poised for “a great leap forward.” By linking data from 300,000 patient EMRs with data from the Canadian Institute for Health Information hospitalization database, the Ontario Drug Benefit database, the Ontario Health Insurance Plan physician billing database, and the Registry of Persons database, Tu can compile patient health histories that are more comprehensive than ever seen before in Ontario – revealing insights into risk factors such as the quality of care for diabetes and ischemic heart disease, and the prevalence of obesity in children.

Screen Shot 2014-02-09 at 5.14.56 PM.png

Tu’s data reach is impressive. But many big data projects dwarf it. A recent assessment of heart drugs conducted within a distributed network of electronic healthcare databases created as part of the Mini-Sentinel program, funded by the US Food and Drug Administration, for example, drew from a source population of more than 100 million people and 350 million person-years of observation time. It yielded risk estimates that were much more precise than those from prior studies.

Walter Wodchis, leader of the Health System Performance Research Network at ICES, says big data – which started with linking up data from acute care, emergency, inpatient rehabilitation, complex continuing care, long term care, home care, physician, pharmacy, labs and assistive devices – is now moving towards linking data on people’s social circumstances; housing; food security; income security.  Meanwhile, as part of a study of 225,000 people in Ontario, the province is developing databanks of biological specimens from individuals that can be used to track long term health outcomes by linking-up genetic and blood profiles with long run health care events. Seen as a whole, the potential for risk factor surveillance seems impressive. 

Not everyone so fervently believes in big data’s potential, however.  “While the volume of data will grow and when various sources are merged, we will be able to generate new hypotheses,” warns Dianne Finegood, President & CEO of the Michael Smith Foundation for Health Research, in Vancouver, “I am not sure this will help all that much with surveillance.” 

The problem, Finegood explains, is that much of the digital data on people’s behaviour may prove unreliable. 

To illustrate her point, Finegood points to potential new sources of nutrition data from mobile devices such as smartphones or movement tracking devices. “You can probably track people entering fast food restaurants,” Finegood argues, “but you can’t determine what they actually eat. So if it depends on people reporting their own consumption it will suffer from the same inaccuracies of self-reporting we currently experience. Even if this was merged with purchase data, people often purchase for more than one person and again it does not say what people actually eat.”  

More broadly, Finegood worries that the notion that big data will help us unravel the complexities of the drivers of non-communicable diseases may be fundamentally flawed. 

In Finegood’s assessment, it’s “a highly reductionist perspective” to think that causes of diseases must be established before we will know what the solutions are. “There are solutions appropriate for complex problems,” she notes, “which don't depend on working out all the causes.” 

At the BC Center for Disease Control, director of public health analytics Laura MacDougall cautions that much of the work being done to integrate datasets in Canada remains outside the realm of big data.

But the integration of genomic data with existing health data is a looming reality, she notes. 

And so too is the integration of consumer data from sources such as loyalty cards. 

“Data is the bread and butter of public health,” MacDougall stresses. “The big data movement highlights just how much new data is out there. We need to start positioning ourselves so that we have the ability to use it.” <>