Biomolecular Object Network Database

Unleashed Informatics

The Blueprint Initiative started as a research program in the lab of Dr. Christopher Hogue the Samuel Lunenfeld Research Institute at Mount Sinai in Toronto. On December 14, 2005 Unleashed Informatics Limited acquired the commercial rights to The Blueprint Initiative intellectual property. This included rights to the protein interaction database BIND, the small molecule interaction database SMID, as well as the data warehouse SeqHound. Unleashed Informatics is a data management service provider and is overseeing the management and curation of The Blueprint Initiative under the guidance of Dr. Hogue[1].

Unleashed Informatics has created a novel resource, the Biomolecular Object Network Database (BOND), which integrates the original blueprint initiative databases as well as other databases, such as Genbank, combined with many tools required to analyze these data. Annotations links for sequences, including taxon identifiers, redundant sequences, Genome Ontology descriptions, Online Mendelian Inheritance in Man identifiers, conserved domains, data base cross-references, LocusLink Identifiers and complete genomes are also available. Bond facilitates cross-database queries and is the first open access resource which integrates interaction and sequence data[2].

Unleashed Informatics also hosts the Small Molecule Interaction Database (SMID). Open access versions of BOND and SMID are available at http://www.unleashedinformatics.com/index.php?pg=products. Commercial versions, BONDplus and SMIDsuiteplus, are also available which provide additional data and bioinformatics support.

User Statistics

The number of Unleashed Registrants has increased 10 fold since the integration of BIND. As of December 2006 registration fell just short of 10,000. Subscribers to the commercial versions of BOND fall into six general categories; agriculture and food, biotech, pharmaceuticals, informatics, materials and other. The biotechnology sector is the largest of these groups, holding 28% of subscriptions. Pharmaceuticals and informatics follow with 22% and 18% respectively. The United States holds the bulk of these subscriptions, 69%. Other countries with access to the commercial versions of BOND include Canada, the United Kingdom, Japan, China, Korea, Germany, France, India and Australia. All of these countries fall below 6% in user share[2].

1. www.blueprint.org 2. bond.unleashedinformatics.com

Biomolecular Interaction Network Database (BIND)

Introduction

The idea of a database to document all known molecular interactions was originally put forth by Tony Pawson in the 1990’s and was later developed by scientists at the University of Toronto in collaboration with the University of British Columbia. The development of the Biomolecular Interaction Network Database (BIND) has been supported by grants from the Canadian Institutes of Health Research (CIHR), Genome Canada, the Canadian Foundation for Innovation and the Ontario Research and Development Fund. BIND was originally designed to be a constantly growing depository for information regarding biomolecular interactions, molecular complexes and pathways. As proteomics is a rapidly advancing field, there is a need to have information from scientific journals readily available to researchers. BIND facilitates the understanding of molecular interactions and pathways involved in cellular processes and will eventually give scientists a better understanding of developmental processes and disease pathogenesis

The major goals of the BIND project are: to create a public proteomics resource that is available to all; to create a platform to enable datamining from other sources (PreBIND); to create a platform capable of presenting visualizations of complex molecular interactions. From the beginning, BIND has been open access and software can be freely distributed and modified. Currently, BIND includes a data specification, a database and associating data mining and visualization tools. Eventually, it is hoped that BIND will be a collection of all the interactions occurring in each of the major model organisms.

Database Structure

BIND contains information on three types of data: interactions, molecular complexes and pathways.

Interactions are the basic component of BIND and describe how 2 or more objects (A and B) interact with each other. The objects can be a variety of things: DNA, RNA, proteins, ligands, genes, or photons. The interaction entry contains the most amount of information about a molecule; it provides information on its name and synonyms, where it is found (e.g. where in the cell, what species, when it is active, etc.), and its sequence or where its sequence can be found. The interaction entry also outlines the experimental conditions required to observe binding in vitro, chemical dynamics (including thermodynamics and kinetics).
The second type of BIND entries are the molecular complexes. Molecular complexes are defined as an aggregate of molecules that are stable a have a function when bound to each other. The record may also contain some information on the role of the complex in various interactions and the molecular complex entry links data from 2 or more interaction records.
The third component of BIND is the pathway record section. A pathway consists of a network of interactions that are involved in the regulation of cellular processes. This section may also contain information on phenotypes and diseases related to the pathway.

The minimum amount of information needed to create an entry in BIND is a PubMed publication reference and an entry in another database (e.g. GenBank). Each entry withiin the database provides references/authors for the data. As BIND is a constantly growing database, all components of BIND track updates and changes (Bader et al. 2001).

Curation and Data Submission

BIND is curated for quality assurance. BIND curation has two tracks: high-throughput (HTP) and low-throughput (LTP). HTP records are from papers which have reported more than 40 interaction results from one experimental methodology. HTP curators typically have a bioinformatics backgrounds. The HTP curators are responsible for the collection of storage of experimental data and they also create scripts to update BIND based on new publications. LTP records are curated by individuals with either an MSc or PhD and laboratory experience in interaction research. LTP curators are given further training through the Canadian Bioinformatics Workshops. Information on small molecule chemistry is curated separately by chemists to ensure the curator is knowledgeable about the subject. The priority for BIND curation is to focus on LTP to collect information as it is published. Although, HTP studies provide more information at once, there are more LTP studies being reported and similar numbers of interactions are being reported by both tracks. In 2004, BIND collected data from 110 journals (Alfarano et al. 2005).

Database Growth

BIND has grown significantly since its conception; in fact, the database saw a 10 fold increase in entries between 2003 and 2004. By September 2004, there were over 100,000 interaction records by 2004 (including 58,266 protein-protein, 4,225 genetic, 874 protein-small molecule, 25,857 protein-DNA, and 19,348 biopolymer interactions). The database also contains sequence information for 31,972 proteins, 4560 DNA samples and 759 RNA samples. These entries have been collected from 11,649 publications; therefore, the database represents an important amalgamation of data. The organisms with entries in the database include: Saccharomyces cerevisiae, Drosophila melanogaster, Homo sapiens, Mus musculus, Caenorhabditis elegans, Helicobacter pylori, Bos taurus, HIV-1, Gallus gallus, Arabidopsis thaliana, as well as others. In total, 901 taxa were included by September 2004 (Alfarano et al. 2005).

Not only is the information contained within the database continually updated, the software itself has gone through several revisions. Version 1.0 of BIND was released in 1999 and based on user feedback it was modified to include additional detail on experimental conditions required for binding and a hierarchical description of cellular location of the interaction. Version 2.0 was released in 2001 and included the capability to link to information available in other databases (Bader et al. 2001). Version 3.0 (2002) expanded the database from physical/biochemical interactions to also include genetic interactions (Bader et al. 2003). Version 3.5 (2004) included a refined user-interface that aimed to simplify information retrieval (Alfarano et al. 2005). In 2006, BIND was incorporated into the Biomolecular Object Network Database (BOND) where it continues to be updated and improved.

Special Features

BIND offers several “features” that many other proteomics databases do not include. The authors of this program have created an extension to traditional IUPAC nomenclature to help describe post-translational modifications that occur to amino acids. These modifications include: acetylation, formylation, methylation, palmitoylation, etc. the extension of the traditional IUPAC codes allows these amino acids to be represented in sequence form as well (Bader 2001). BIND also utilizes a unique visualization tool known as OntoGlyphs. There are 83 OntoGlyph characters which represent three types of molecular attributes: function, binding, cellular localization. The OntoGlyphs were developed based on Gene Ontology (GO) and provide a link back to the original GO information. There are 34 functional OntoGlyphs which contain information about the role of the molecule (e.g. cell physiology, ion transport, signaling). There are 25 binding OntoGlyphs which describe what the molecule binds (e.g. ligands, DNA, ions). The other 24 OntoGlyphs provide information about the location of the molecule within a cell (e.g. nucleus, cytoskeleton). The OntoGlyphs can be selected and manipulated to include or exclude certain characteristics from search results. The visual nature of the OntoGlyphs also facilitates pattern recognition when looking at search results (Alfarano et al. 2005).