Dr. Howard Hu, University of Toronto

CARRFS eNews spoke with Dr. Howard Hu, Dean, University of Toronto about the Ethics in Data Driven Health

Howard Hu.jpg

From an ethical standpoint, what are the critical issues in using the Canadian Community Health Survey (CCHS) as the primary source for public health policy making? And how do we best compensate for the deficiencies that are embedded in their methodology and data?

Like any major source of data that is population-wide, there are limitations. My view, as a population epidemiologist and as a dean of a public health school, is that the advantages of having access to sources of data like CCHS continues to out-weigh the limitations and deficiencies. But, every single statement of an insight made possible by the data has to be accompanied by a very well-informed statement of what the limitations are. Nuances that are often excluded in this “sound bite” age can never be neglected by the public health professionals who are using this data and trying to communicate the findings to the public.  

I think there are ways to enrich the understanding of what these databases mean by bringing in other sources of data to validate insights discovered using the original database. Speaking from this school, we have a certain expertise in critical qualitative research. The questions that are posed need to be examined very closely in relation to the information that is available in a database like this. Even without looking at the results, there are limitations to the questions that are being posed that need to be interpreted. Part of that is to define what questions are necessary but cannot be examined using the database that is available to us in order to fully understand what we know about health risks. It is sort of like the cartoon of a researcher looking only where the street lamp shines its light – because that is where the light is. But the dark area has lots of meaning in terms of mapping reality. Because the light is not there, you are not going to get any information.  

As we move towards a data-driven health care system, the accumulation of data has some profound ethical consequences. How far do you think we can go in this direction without overstepping ethical issues related to privacy and the infringement of individual rights?  

There are at least two critical elements here to actually securing the support of the community. One of them is an informed community. We need to make sure that the data that is being gathered, the questions that are being addressed and the security of the collected data is communicated, in every form possible, allowing the contributors of the data to understand that their contributions are going towards a major effort to improve health, risk factors for health, and/or predictors of health. 

The second is community participation in the posed questions, the utilization of the information, and the purpose. The linkage to the policy questions that are being posed such as “should we put resources here or there or should we be eliminating public exposure to x, y or z” has to be clear from the outset. It is very important that the community understands why we are doing the research and not simply what the research is, and what the questions are. Even with that information and with that degree of community participation there always has to be a choice element so that people can participate or not. 

I think the grey zone is with respect to the passive generation of data that is in the public domain. With the universal health access system in Canada and the ability to passively collect all the data that relates to people’s utilization of the health system and use of prescriptions etc., there is an element of passive data collection that, arguably (if it is fully anonymous), it would be counter-productive to require every citizen to go through additional steps of informed consent to allow us to use the data. 

The other extreme, – i.e. the collection of data that is for research only such as biological samples for research or information collected about health care utilization and preventative maintenance of health services that is only for research – will obviously require informed consent. The grey zone is where there is information that is being generated that is neither obviously relevant to health care or for research.  This requires careful considerations. Having a “body” that involves substantial community participation, like an ethical review committee, to ponder the questions in the grey zone and to provide guidance would be the process in which we can utilize the data and make decisions properly.

Public Health has become increasingly complex and has a strong need to collect data – not only within but also outside the health sector. In a fiscally constrained environment, how do health professionals deal with this issue? And what solutions do you see?

The irony is that as the big data enterprise [becomes] more sophisticated, merging health data with non-health data has become cheap. But big data has let us realize that we know very little about the environmental, [and] nutritional determinants of many chronic diseases. We do know that for many chronic diseases the proportion of the disease causation that is due to non-genetic environmental nutritional factors is huge. A lot of this has been generated from studies that have emerged in the past 20 years involving large databases of twins, because twins – based on their genetic identity for identical and fraternal twins – actually give us a population of individuals that, followed over time, allow us to quantitatively estimate what proportions of a disease that arises are genetic and what are non-genetic. With these insights, we have generated a lot of urgency in re-doubling our effort to identify what the upstream non-genetic risk factors are, so we can be better at prevention. This will necessarily involve bringing into these databases data on occupation, geography, air pollution, water pollution exposures, and lifestyle factors that go beyond what the usual health system has typically generated. I will have to say that I am an enthusiastic supporter of that form of big data epidemiology. It does mean that one needs to pay attention to the privacy community involvement, and [the] community participation that was discussed above. 

Will this development expand the role of epidemiologists?

In my point of view – not only as an academic administrator, but also as a researcher – this is the fun part. I think multi-disciplinary and trans-disciplinary is the exciting part of public health science and epidemiology because we get to dabble with many issues and deal with experts in many different disciplines. We are the “circus masters” bringing together these different forms of data and experts in an integrated harmonized format so we can actually see patterns and help identify issues that ultimately will improve public health.

As the head of a public health school, how do you prepare the new generation of epidemiologists / health professionals for the new environment?

One of the teaching philosophies that we are promoting in the school is the appreciation of how the explosion of data and knowledge has made it impossible for us to simply teach knowledge as the foundation from which people actually build their career or understand the questions to pursue. We have a much more heavy emphasis on the [societal] problems – the unsolved public health problems by bringing them into the classroom as a generators for the questions and the source on which the methodology and the approach to these problems can be studied – including integration of different disciplines of knowledge which they have to understand and learn – the competencies if you will. It is up to the students to know where to get the information and to bring that into the problem solving process – how they are going to approach the problem, who to bring in to collaborate on solving the problem? I think this is the kind of approach that is necessary for the 21st century of public health because it teaches, essentially, the process of how to approach problems – the foundation in which you know what kinds of information you need to get. And tapping into that information is what the world has brought us with the information revolution. But it is what you do at first – how you approach a problem, how you model it, how you think about it – that is what we really teach. Big data is [a] tool somewhere in that process.

How do you teach the students to maneuver in the ethics of generating health data?

Luckily we have very close relationships with a unique entity here at the University of Toronto – the Joint Centre for Bioethics. The Centre began as a multi-disciplinary institution that helped health care institutions to build their ethical review boards and grappled with very thorny questions on how to do research in clinical environments. But over time, it has embraced big public health research questions, global health research questions – things that cross boundaries with transnational issues, cultural issues between institutions and countries. It has become the source of many of our scholars who teach our students about fundamental approaches to understanding the ethical issues that unfold from research and how to approach them. Those are fundamental tenets for our students and hopefully they also become the communicators of some of those methodologies in their work. 

From a measure-centric reductionist perspective in epidemiology, we often believe that the solutions could arrive from reversing the causes to the problems. What would be the right approach for decision/policy makers to avoid falling into this logical trap?

My personal view is that it is lovely to be in the information age but, like any kind of research, initial findings have to be validated. An initial finding based on observational studies runs the risk of generating conclusions that are not supported when using randomized trials. Those are the kinds of discrepancies and inconsistencies that really shake the public faith in headlines that they read about the next new study that comes out. And rightfully so! 

I think that the public is hungry for good information and good advice. The job of our senior statesmen in public health is to provide the right level of interpretation and caution of the new results that come out. One thing is certain, when there is a critical mass of evidence – whether it is experimental or observational or randomized trails [with] multiple-validated observations in multiple-diverse populations – that the policy recommendations should have weight, and make a difference. 

Just this week, one of our global health experts, Dr. Prabhat Jha, has published in The New England Journal of Medicine a superb article that reviews the effect of taxation policy on tobacco’s impact on mortality and morbidity. There is no doubt that taxation policy has a huge impact on tobacco related deaths. This is a policy tool that is underutilized and should be utilized a lot. Whether that will someday be followed by a sugar tax or something like that, we don’t quite know yet. These things take time to fully shape into what should be supported by population-based policy. Again, limitations also deserve to be part of the picture. Even if we end up with a sugar taxation policy or something, we always have to acknowledge that there are segments of the population that don’t really need it – that will suffer an unnecessary burden because in fact their metabolisms are less prone to sugar-induced obesity. These are societal costs that have to be accepted, like any policy measures that are good for society and that sometimes outweigh some of the minor discomforts from some segments of the society that don’t need it. <>

By Jostein Algroy



Today we see fashion items such as wristband, smartphone, wearable computers as a new status symbol tracking human movements - including sleeping pattern. The IT industry has made it fashionable to track your vital “health” behaviour data - data that can be aggregated into vast data pool and with smart algorithms inform the citizen about their health and health related behaviour. This development is quite a step away from the public health institutions survey approach.  Dr. Howard Hu, explore potential dangers and benefits.

I don’t have a scholarly response to this. I think the use of the term “fashionable” aptly describe a trend that has not been examined in a systematic way. My own gut response is that there is a certain element of obsessive compulsiveness and/or narcissism to it - because we are all curious about what we can learn about ourselves. It is not clear to me whether the information ultimately is useful and it is certainly not collected in a systematically way. Like anything else involving the IT industry, there is a leading group of people who are probably not representative of the general population who are generating this kind of information. 

But at the same time, I think that it is also part of the future. We will see lots of passive technologies where you don’t have to press a button, you don’t have to program anything - the device will collect the information about you and you have to actively stop it from collecting information if you don’t want the information. That is kind of the future. Whether or not it turns into tools that are useful for public health research or for personalized medicine, or for personalized health behavioural adjustments is left to be seen. 

I’d like to see  an early scholarly approach to this so that the public has some insights into how useful it might be - what the actual utility might be. I am aware of some scholars who are interested in this area. One of the early initiatives has been to look at Internet seeking behaviour and to look at how Google searches can be used to map population wide early epidemics - earlier than what the laboratories can see - using the geography of the internet seeking behaviour to physically map some of the disease trends in the community. With all the new smartphone apps that have been developed which goes much farther beyond what you type into the computer - it would be very interesting to see what evolves in the future.  <>