Big Data to Knowledge
Big Data to Knowledge (BD2K) is a project of the National Institutes of Health for knowledge extraction from big data.
BD2K was founded in 2013 in response to a report from the Working Group on Data and Informatics for the Advisory Committee to the Director of the National Institutes of Health.[1]
A significant part of BD2K's plans is to have organizations make plans to share their research data when they make a proposal in response to a funding opportunity announcement.[2]
Philip Bourne is the lead in managing the project.[3]
Centers of Excellence for Big Data Computing
A Community Effort to Translate Protein Data to Knowledge: An Integrated Platform (HeartBD2K@UCLA)
The University of California Los Angeles
PIs: Peipei Ping, Merry Lindsey, Andrew Su, and Karol Watson
Grant Number: 1U54GM114833-01[4]
The NIH BD2K Center of Excellence for Big Data at UCLA[5] will embark on the project, A Community Effort to Translate Protein Data to Knowledge: An Integrated Platform,[6] in order to fundamentally alter biomedical research culture to enable full employment of technological modeling innovations, such as crowdsourcing to biomedical Big Data analysis. The goal of this center is to democratize data research to include non-computational scientists and individuals and to apply innovative global community-driven data integration and modeling methods to address challenges involved in the study of protein structure, function, and networks with a focus on cardiovascular research.
Center for Causal Modeling and Discovery of Biomedical Knowledge from Big Data
The University of Pittsburgh at Pittsburgh
PIs: Gregory F. Cooper, Ivet Bahar, and Jeremy Berg
Grant Number: 1U54HG008540-01[7]
The Center for Causal Modeling and Discovery of Biomedical Knowledge from Big Data will develop user-friendly tools and resources that use Bayesian statistics to generate causal models from large and complex datasets. Initial tool and method development efforts will focus on three biomedical problems that involve large amounts of data: cell signals that drive cancer development, the molecular basis of lung disease susceptibility and severity, and functional connections in the human brain.[8]
The Center for Predictive Computational Phenotyping
The University of Wisconsin – Madison
PI: Mark W. Craven
Grant Number: 1U54AI117924-01[9]
The Center for Predictive Computational Phenotyping aims to accelerate the impact of predictive modeling on clinical practice. The Center will focus on issues related to computational phenotyping and will produce disease prediction models from machine learning and statistical methods; these models will integrate data from electronic health records, images, molecular profiles and other datasets to predict patient risks for breast cancer, heart attacks and severe blood clots.[10]
The National Center for Mobility Data Integration to Insight (The Mobilize Center)
Stanford University
PI: Scott L. Delp
Grant Number: 1U54EB020405-01[11]
The Mobilize Center is poised to provide access to mobility data for over ten million people. The center will develop and disseminate a range of novel data science tools, including modeling and analysis methods to predict and improve the outcomes of surgeries in children with cerebral palsy and gait pathology; to identify new approaches to optimize mobility in individuals with osteoarthritis, running injuries, and other movement impairments; and to discover methods that motivate overweight and obese individuals to exercise more and in ways that promote joint health.[12][13]
KnowEng, a Scalable Knowledge Engine for Large-Scale Genomic Data
The University of Illinois Urbana-Champaign
PIs: Jiawei Han, Saurabh Sinha, Jun Sorg, and Richard Weinshilboum
Grant Number: 1U54GM114838-01[14]
The KnowEng Center will build a computational Knowledge Engine that uses data mining and machine learning techniques to obtain and combine gene function and gene interaction information from disparate genomic data sources. This integrated genomic environment will enable scientists and medical practitioners to add their own datasets to the engine and explore models generated from the incorporation of their data within the existing knowledge-base.[15]
Center for Big Data in Translational Genomics
The University of California Santa Cruz
PIs: David H. Haussler, David Patterson, and Laura Van’t Veer
Grant Number: 1-U54HG007990-01[16]
The Center for Big Data in Translational Genomics is a multinational collaboration between academia and industry that will create data models and analysis tools to analyze massive datasets of genomic information. Such tools can be used for analysis of the genomes and the gene expression data from thousands of individuals to uncover the contribution of gene variants to disease, with an initial focus on cancer. This knowledge will be instrumental in the development of precision diagnostic and treatment methods.[17]
Patient-Centered Information Commons
Harvard University Medical School
PI: Isaac S. Kohane
Grant Number: 1U54HG007963-01[18]
Investigators at the Patient-Centered Information Commons will develop systems to combine genetic, environmental, imaging, behavioral, and clinical data on individual patients from multiple sources into integrated sets. Computing across thousands of such individuals, will enable more accurate classification of individual disease or disease risk, and facilitate greater precision in patient disease prevention and treatment strategies.[19]
Center of Excellence for Mobile Sensor Data-to-Knowledge (MD2K)
The University of Memphis
PI: Santosh Kumar
Grant Number: 1U54EB020404-01[20]
Researchers at the Center of Excellence for Mobile Sensor Data-to-Knowledge will develop innovative tools to make it easier to gather, analyze and interpret data from mobile sensors. These tools will reduce the burden of complex chronic disorders on health and healthcare by enabling detection and prediction of person-specific disease risk factors ahead of the onset of adverse clinical events. The center will study two specific problems as test cases: reducing hospital readmissions for patients with congestive heart failure and preventing relapse in those who have quit smoking.[21]
Center for Expanded Data Annotation and Retrieval (CEDAR)
Stanford University
PI: Mark A. Musen
Grant Number: 1U54AI117925-01[22]
The ability to locate, analyze, and integrate Big Data depends on the metadata that describe the content of data sets. The Center for Expanded Data Annotation and Retrieval (CEDAR) will facilitate automated annotation of data with high quality metadata by generating community-based metadata standards and a metadata repository for training learning algorithms to develop metadata templates. These templates will initially be evaluated, validated, and adapted with the NIAID ImmPort multi-assay data repository and other data repositories.[23]
ENIGMA Center for Worldwide Medicine, Imaging, and Genomics
The University of Southern California
PI: Paul M. Thompson
Grant Number: 1U54EB020403-01[24]
The ENIGMA Center for Worldwide Medicine, Imaging and Genomics will incorporate the scientific acumen of more than 300 scientists worldwide, and their biomedical datasets, in a global effort to combat human brain diseases. This center will develop computational methods for integration, clustering, and learning from complex biodata types. This center’s projects will help identify factors that either resist or promote brain disease, and those that help diagnosis and prognosis, and will also help identify new mechanisms and drug targets for mental health care.[25]
Big Data for Discovery Science
The University of Southern California
PI: Arthur W. Toga
Grant Number: 1U54EB020406-01[26]
Researchers at the Big Data for Discovery Science Center will focus on proteomics, genomics, and images of cells and brain collected from patients and subjects across the globe. They will enable detection of patterns, trends and relationships among these data with user-focused data management, sophisticated computational methodologies, and leading-edge software tools for the efficient large-scale analysis of biomedical data. Interactive visualization tools created at this center will stimulate fresh insights and encourage the development of modern treatments and new cures for disease.[27]
References
- ^ Ohno-Machado, L. (2014). "NIH's Big Data to Knowledge initiative and the advancement of biomedical informatics". Journal of the American Medical Informatics Association. 21 (2): 193–193. doi:10.1136/amiajnl-2014-002666. ISSN 1067-5027.
- ^ Miller, Katharine (19 February 2013). "NIH Announcement: Big Data Gets Big Support | Biomedical Computation Review". biomedicalcomputationreview.org. Retrieved 28 July 2014.
- ^ Margolis, R.; Derr, L.; Dunn, M.; Huerta, M.; Larkin, J.; Sheehan, J.; Guyer, M.; Green, E. D. (2014). "The National Institutes of Health's Big Data to Knowledge (BD2K) initiative: capitalizing on biomedical big data". Journal of the American Medical Informatics Association. doi:10.1136/amiajnl-2014-002974. ISSN 1067-5027.
- ^ "UCLA BD2K Project Information - A Community Effort to Translate Protein Data to Knowledge: An Integrated Platform". projectreporter.nih.gov.
- ^ "UCLA BD2K Center of Excellence Homepage". HeartBD2K.org.
- ^ "UCLA BD2K Abstract" (PDF). bd2k.nih.gov.
- ^ "University of Pittsburgh at Pittsburgh BD2K Project Information - Center for Causal Modeling and Discovery of Biomedical Knowledge from Big Data". projectreporter.nih.gov.
- ^ "University of Pittsburgh at Pittsburgh BD2K Abstract" (PDF). bd2k.nih.gov.
- ^ "University of Wisconsin – Madison BD2K Project Information - Center for Predictive Computational Phenotyping". projectreporter.nih.gov.
- ^ "University of Wisconsin – Madison BD2K Abstract" (PDF). bd2k.nih.gov.
- ^ "Stanford University BD2K Project Information - National Center for Mobility Data Integration to Insight (The Mobilize Center)". projectreporter.nih.gov.
- ^ "The Mobilize BD2K Center of Excellence Homepage". mobilize.stanford.edu/.
- ^ "Stanford University Mobilize Center BD2K Abstract" (PDF). bd2k.nih.gov.
- ^ "The University of Illinois Urbana-Champaign BD2K Project Information - KnowEng, a Scalable Knowledge Engine for Large-Scale Genomic Data". projectreporter.nih.gov.
- ^ "The University of Illinois Urbana-Champaign BD2K Abstract" (PDF). bd2k.nih.gov.
- ^ "The University of California Santa Cruz BD2K Project Information - National Center for Mobility Data Integration to Insight (The Mobilize Center)". projectreporter.nih.gov.
- ^ "The University of California Santa Cruz BD2K Abstract" (PDF). bd2k.nih.gov.
- ^ "Harvard University Medical School BD2K Project Information - Patient-Centered Information Commons". projectreporter.nih.gov.
- ^ "Harvard University Medical School BD2K Abstract" (PDF). bd2k.nih.gov.
- ^ "The University of Memphis BD2K Project Information - Center of Excellence for Mobile Sensor Data-to-Knowledge (MD2K)". projectreporter.nih.gov.
- ^ "The University of Memphis BD2K Abstract" (PDF). bd2k.nih.gov.
- ^ "Stanford University BD2K Project Information - Center for Expanded Data Annotation and Retrieval (CEDAR)". projectreporter.nih.gov.
- ^ "Stanford University (CEDAR) BD2K Abstract" (PDF). bd2k.nih.gov.
- ^ "The University of Southern California BD2K Project Information - ENIGMA Center for Worldwide Medicine, Imaging, and Genomics". projectreporter.nih.gov.
- ^ "The University of Southern California ENIGMA BD2K Abstract" (PDF). bd2k.nih.gov.
- ^ "The University of Southern California BD2K Project Information - Big Data for Discovery Science". projectreporter.nih.gov.
- ^ "The University of Southern California Big Data for Discovery Science BD2K Abstract" (PDF). bd2k.nih.gov.