Jump to content

List of biological databases

From Wikipedia, the free encyclopedia
This is an old revision of this page, as edited by IgorRodchenkov (talk | contribs) at 00:07, 8 February 2014 (Meta databases: added iRefIndex). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

Biological databases are stores of biological information.[1]

Primary nucleotide sequence databases

The International Nucleotide Sequence Database (INSD) consists of the following databases.

  1. DNA Data Bank of Japan (National Institute of Genetics)
  2. European Nucleotide Archive (European Bioinformatics Institute)
  3. GenBank (National Center for Biotechnology Information)

The three databases, DDBJ (Japan), GenBank (USA) and European Nucleotide Archive (Europe), are repositories for nucleotide sequence data from all organisms. All three databases accept nucleotide sequence submissions, and then exchange new and updated data on a daily basis to achieve optimal synchronisation between them. These three databases are primary databases, as they house original sequence data.

Meta databases

These databases of databases collect data from different sources and make them available in new and more convenient form, or with an emphasis on a particular disease or organism.

  1. BioGraph (University of Antwerp, Vlaams Instituut voor Biotechnologie) A knowledge discovery service based on the integration of more than 20 heterogeneous databases
  2. Bioinformatic Harvester[1] (Karlsruhe Institute of Technology) - Integrating 26 major protein/gene resources.
  3. Neuroscience Information Framework[2] (University of California San Diego) - Integrates hundreds of neuroscience relevant resources, many are listed below.
  4. ConsensusPathDB - A molecular functional interaction database, integrating information from 12 other databases.
  5. Entrez[3] (National Center for Biotechnology Information)
  6. Enzyme Portal Integrates enzyme information such as small-molecule chemistry, biochemical pathways and drug compounds. (European Bioinformatics Institute)
  7. euGenes (Indiana University)
  8. GeneCards (Weizmann Inst.)
  9. MetaBase[4] (KOBIC) - A user contributed database of biological databases.
  10. mGen containing four of the world biggest databases GenBank, Refseq, EMBL and DDBJ - easy and simple program friendly gene extraction
  11. PathogenPortal A repository linking to the Bioinformatics Resource Centers (BRCs) sponsored by the National Institute of Allergy and Infectious Diseases (NIAID)
  12. SOURCE (Stanford University) encapsulates the genetics and molecular biology of genes from the genomes of Homo sapiens, Mus musculus, and Rattus norvegicus into easy to navigate GeneReports
  13. iRefIndex: provides an index of protein interactions available in a number of primary interaction databases including BIND, BioGRID, CORUM, DIP, HPRD, InnateDB, IntAct, MatrixDB, MINT, MPact, MPIDB, MPPI and OPHID.
  14. Pathway Commons (Memorial Sloan-Kettering Cancer Center and University of Toronto)

Genome databases

These databases collect organism genome sequences, annotate and analyze them, and provide public access. Some add curation of experimental literature to improve computed annotations. These databases may hold many species genomes, or a single model organism genome.

  1. Bioinformatic Harvester
  2. SNPedia
  3. CAMERA Resource for microbial genomics and metagenomics
  4. Corn, the Maize Genetics and Genomics Database
  5. EcoCyc a database that describes the genome and the biochemical machinery of the model organism E. coli K-12
  6. Ensembl provides automatic annotation databases for human, mouse, other vertebrate and eukaryote genomes.
  7. Ensembl Genomes provides genome-scale data for bacteria, protists, fungi, plants and invertebrate metazoa, through a unified set of interactive and programmatic interfaces (using the Ensembl software platform).
  8. PATRIC, the PathoSystems Resource Integration Center
  9. Flybase, genome of the model organism Drosophila melanogaster
  10. MGI Mouse Genome (Jackson Lab.)
  11. JGI Genomes of the DOE-Joint Genome Institute provides databases of many eukaryote and microbial genomes.
  12. National Microbial Pathogen Data Resource. A manually curated database of annotated genome data for the pathogens Campylobacter, Chlamydia, Chlamydophila, Haemophilus, Listeria, Mycoplasma, Neisseria, Staphylococcus, Streptococcus, Treponema, Ureaplasma, and Vibrio.
  13. RegulonDB RegulonDB is a model of the complex regulation of transcription initiation or regulatory network of the cell E. coli K-12.
  14. Saccharomyces Genome Database, genome of the yeast model organism.
  15. Viral Bioinformatics Resource Center Curated database containing annotated genome data for eleven virus families.
  16. The SEED platform for microbial genome analysis includes all complete microbial genomes, and most partial genomes. The platform is used to annotate microbial genomes using subsystems.
  17. Xenbase, genome of the model organism Xenopus tropicalis and Xenopus laevis
  18. Wormbase, genome of the model organism Caenorhabditis elegans
  19. Zebrafish Information Network, genome of this fish model organism.
  20. TAIR, The Arabidopsis Information Resource.
  21. UCSC Malaria Genome Browser, genome of malaria causing species (Plasmodium falciparumata and others)
  22. RGD Rat Genome Database: Genomic and phenotype data for Rattus norvegicus
  23. [5] INTEGRALL: Database dedicated to integrons, bacterial genetic elements involved in the antibiotic resistance
  24. Fourmidable ant genome database provides ant genome blast search and sequence download.
  25. VectorBase The NIAID Bioinformatics Resource Center for Invertebrate Vectors of Human Pathogens
  26. EzGenome, comprehensive information about manually curated genome projects of prokaryotes (archaea and bacteria) [2]
  1. UniProt Universal Protein Resource (EBI, Swiss Institute of Bioinformatics, PIR)
  2. Protein Information Resource (Georgetown University Medical Center (GUMC))
  3. Swiss-Prot Protein Knowledgebase (Swiss Institute of Bioinformatics)
  4. PEDANT Protein Extraction, Description and ANalysis Tool (Forschungszentrum f. Umwelt & Gesundheit)
  5. PROSITE Database of Protein Families and Domains
  6. Database of Interacting Proteins (Univ. of California)
  7. Pfam Protein families database of alignments and HMMs (Sanger Institute)
  8. PRINTS a compendium of protein fingerprints from (Manchester University)
  9. ProDom Comprehensive set of Protein Domain Families (INRA/CNRS)
  10. SignalP 3.0 Server for signal peptide prediction (including cleavage site prediction), based on artificial neural networks and HMMs
  11. SUPERFAMILY Library of HMMs representing superfamilies and database of (superfamily and family) annotations for all completely sequenced organisms
  12. Annotation Clearing House a project from the National Microbial Pathogen Data Resource
  13. InterPro Classifies proteins into families and predicts the presence of domains and sites.

Proteomics databases

  1. Proteomics Identifications Database (PRIDE) A public repository for proteomics data, containing protein and peptide identifications and their associated supporting evidence as well as details of post-translational modifications. (European Bioinformatics Institute)
  2. MitoMiner - A mitochondrial proteomics database integrating large-scale experimental datasets from mass spectrometry and GFP studies for 12 species. (MRC Mitochondrial Biology Unit)

Protein Data Bank (PDB) comprising:

Secondary databases

  1. SCOP Structural Classification of Proteins
  2. CATH Protein Structure Classification
  3. PDBsum

For more protein structure databases, see also Protein structure database

Protein model databases

  1. Swiss-model[6] Server and Repository for Protein Structure Models
  2. ModBase[7] Database of Comparative Protein Structure Models (Sali Lab, UCSF)
  3. Protein Model Portal[8] (PMP) Meta database that combines several databases of protein structure models (Biozentrum, Basel, Switzerland)

RNA databases

  1. Rfam [9], a database of RNA families
  2. miRBase [10], the microRNA database
  3. snoRNAdb, a database of snoRNAs
  4. lncRNAdb, a database of lncRNAs
  5. MONOCLdb The MOuse NOnCode Lung database: Annotations and expression profiles of mouse long non-coding RNAs (lncRNAs) involved in Influenza and SARS-CoV infections.
  6. piRNAbank, a database of piRNAs
  7. GtRNAdb, a database of genomic tRNAs
  8. SILVA, a database of ribosomal RNAs
  9. RDP, the Ribosomal Database Project
  10. tmRDB, a database of tmRNAs
  11. SRPDB, a database of signal recognition particle RNAs
  12. yeast snoRNA database
  13. Sno/scaRNAbase, a database of snoRNA and scaRNAs
  14. snoRNA-LBME-db, a snoRNA database

Carbohydrate structure databases

  1. EuroCarbDB[11], A repository for both carbohydrate sequences/structures and experimental data.
  1. BIND Biomolecular Interaction Network Database
  2. BioGRID [12] A General Repository for Interaction Datasets (Samuel Lunenfeld Research Institute)
  3. CCSB Interactome
  4. DIP Database of Interacting Proteins
  5. IntAct molecular interaction database: a central, standards-compliant repository of molecular interactions, including protein–protein, protein–small molecule and protein–nucleic acid interactions.
  6. NetPro
  7. STRING: STRING is a database of known and predicted protein-protein interactions. (EMBL)
  8. The Cell Collective
  9. MINT: Molecular INTeraction database
  10. iRefIndex: provides an index of protein interactions available in a number of primary interaction databases including BIND, BioGRID, CORUM, DIP, HPRD, InnateDB, IntAct, MatrixDB, MINT, MPact, MPIDB, MPPI and OPHID.
  1. Cancer Cell Map
  2. Netpath - A curated resource of signal transduction pathways in humans
  3. NCI-Nature Pathway Interaction Database
  4. Reactome - Navigable map of human biological pathways, ranging from metabolic processes to hormonal signalling.
  5. SignaLink Database
  6. WikiPathways
  7. The Cell Collective
  1. Small Molecule Pathway Database (SMPDB)
  2. BioCyc Database Collection including EcoCyc and MetaCyc
  3. KEGG PATHWAY Database[13] (Univ. of Kyoto)
  4. MANET database [14] (University of Illinois)
  5. Metabolights Metabolomics experiments and derived information: metabolite structures, reference spectra, biological roles, locations and concentrations. (European Bioinformatics Institute)
  6. Reactome[15] Navigable map of human biological pathways, ranging from metabolic processes to hormonal signalling. (Cold Spring Harbor Laboratory, European Bioinformatics Institute, Gene Ontology Consortium)

Microarray databases

  1. ArrayExpress (European Bioinformatics Institute)
  2. Gene Expression Omnibus (National Center for Biotechnology Information)
  3. GPX(Scottish Centre for Genomic Technology and Informatics)
  4. maxd (Univ. of Manchester)
  5. Stanford Microarray Database (SMD) (Stanford University)
  6. Genevestigator - Expression Search Engine (Nebion AG)

Exosomal databases

Mathematical model databases

  1. Biomodels Database: published mathematical models describing biological processes.
  2. CellML
  3. The Cell Collective: build and simulate large-scale models in real-time and in a highly collaborative fashion
  1. PathoOligoDB: A free QPCR oligo database for pathogens
  2. RTPrimerDB - a public primers and probes database for real-time PCR reactions

Specialized databases

Taxonomic databases

  1. Catalogue of Life source databases
  2. Encyclopedia of Life
  3. Integrated Taxonomic Information System
  4. EzTaxon-e, database for the identification of prokaryotes based on 16S ribosomal RNA gene sequences

Wiki-style databases

  1. CHDwiki
  2. EcoliWiki
  3. Gene Wiki
  4. GyDB
  5. NeuroLex
  6. OpenWetWare
  7. PDBWiki
  8. Proteopedia
  9. RiceWiki
  10. Topsan
  11. WikiGenes
  12. WikiPathways
  13. WikiProfessional
  14. YTPdb

Unsorted

References

  1. ^ Wren JD, Bateman A (2008). "Databases, data tombs and dust in the wind". Bioinformatics. 24 (19): 2127–8. doi:10.1093/bioinformatics/btn464. PMID 18819940.
  2. ^ http://ezgenome.ezbiocloud.net/