Penn State Biomedical Big Data to Knowledge (B2D2K) Training Program

Vasant Honavar

Sponsoring Agency
National Library of Medicine


The BD2K initiative was developed by the NIH to enable biomedical researchers to capitalize on the Big Data being generated, foster new discovery and increase biological knowledge. The need to train a new generation of skilled scientists in computation, informatics, and statistics to surmount the challenges of big data analysis for biological and biomedical science is widely recognized. An important recommendation with respect to big data computing was to build capacity by training the workforce in the relevant quantitative sciences such as bioinformatics, biomathematics, biostatistics, and clinical informatics. Basic science and biomedical advances rely increasingly on these very large, complex datasets generated by high throughput -omic and other biological technologies, and sound statistical reasoning and sophisticated computational techniques are needed throughout the process of analysis and discovery. This includes all stages of investigation, from experimental design and data pre-processing, de-noising and normalization, to integrating multiple datasets, testing hypotheses, and visualizing data in interactive and informative ways.

The new challenges posed by high dimensional and complex data require that life and computer scientists working with big data acquire a substantive understanding of statistics and bioinformatics, and that statisticians working in this area, in return, acquire a substantive understanding of biological principles, experimental technologies and computation. These will converge into an interdisciplinary domain where existing statistical and computational tools are used and combined effectively, and novel methods are generated, to promote innovation and discovery in big data analysis for biomedical science. This interdisciplinary communication is essential for the emergence of a new cadre of researchers who can effectively communicate with their peers in the complementary disciplines required for tackling real problems important for life sciences in big data. The Biomedical Big Data to Knowledge (B2D2K) Training Program at The Pennsylvania State University will bring together Data Science researchers and educators from 5 colleges at Penn State: the Colleges of Science, Engineering, Health and Human Development, Information Sciences and Technology, and Medicine, and Geisinger Health System to create a truly transformative multi-disciplinary predoctoral training environment.

The goal of the B2D2K program is to train a diverse cohort comprising the next-generation biomedical data scientists with a deep knowledge of Data Science to develop novel algorithmic and statistical methods for building predictive, explanatory, and causal models through integrative analyses of disparate types of biomedical data (including Electronic Health Records, genomics, behavioral, socio-economic, and environmental data) to advance science and improve health. We believe that the investment in this generation of data scientists will be critical to see all of the `Biomedical Big Data' fully utilized to its greatest potential.


Research Area
Artificial Intelligence and Big Data
Health and Bioinformatics