Integration of environmental factors and causal reasoning approaches for large-scale observational health research

Vasant Honavar

Sponsoring Agency
National Science Foundation


Vast quantities of health, environmental, and behavioral data are being generated today, yet they remain locked in digital silos. For example, data from health care providers, such as hospitals, provide a dynamic view of the health of individuals and populations from birth to death. At the same time, government institutions and industry have released troves of economic, environmental, and behavioral datasets, such as indicators of income/poverty, adverse exposure (e.g., air pollution), and ecological factors (e.g., climate) to the public domain. How are economic, environmental, and behavioral factors linked with health?

This project will put together numerous sources of large environmental and clinical data streams to enable the scientific community to address this question. By breaking current data silos, the broader scientific impacts will be wide. First, this effort will foster new routes of biomedical investigation for the big data community. Second, the project will enable discoveries that will have behavioral, economic, environmental, and public health relevance.

It also aims to assemble a first-ever data warehouse containing numerous health/clinical, environmental, behavioral, and economic data streams to ultimately enable causal discovery between these data sources. The ultimate goal of the project is to facilitate community-led and collaborative causal discovery through dissemination of integrated and open big data and analytics tools.

Research Area
Artificial Intelligence and Big Data
Health and Bioinformatics