Microarray databases
Appearance
The term microarray database is usually used to describe a repository containing microarray gene expression data. The key features of a microarray database are to store the measurement data, manage a searchable index, and make the data available to other applications for analysis and interpretation (either directly, or via user downloads).
Microarray databases can fall into two distinct classes:
- A peer reviewed, public repository that adheres to academic or industry standards and is designed to be used by many analysis applications and groups. A good example of this is the Gene Expression Omnibus (GEO) from NCBI or ArrayExpress from EBI.
- A specialized repository associated primarily with the brand of a particular entity (lab, company, university, consortium, group), an application suite, a topic, or an analysis method, whether it is commercial, non-profit, or academic. These databases may be characterized by:
- A subscription or license may be needed to gain full access,
- The content may come primarily from a specific group (e.g. SMD, or UPSC-BASE),
- There may be limits on how who can use the data, and for what purpose,
- Special permission may be required to submit new data, or there may be no obvious process at all,
- Only certain applications may be equipped to use the data, often also associated with the same entity (for example, caArray at NCI is specialized for the caBIG),
- Further processing or reformatting of the data may be required for standard applications or analysis,
- They claim to address the 'urgent need' to have a standard, centralized repository for microarray data. (See YMD, last updated in 2003, for example),
- There is a claim to an incremental improvement over one of the public repositories,
- A meta-analysis application, which incorporates studies from one or more public databases (e.g. Gemma primarily uses GEO studies; NextBio uses various sources)
Some of the most known public, curated microarray databases are:
Database | Scope | Microarray experiment sets | Sample profiles | As of date |
Gene Expression Omnibus - NCBI | any curated MIAME compliant molecular abundance study | 8094 | 205148 | March 11, 2008 |
Stanford Microarray database | private and published microarray and molecule abundance database | 441 | 38925 | March 27, 2009 |
Genevestigator database | Manually curated microarray data for expression meta-analysis | >500 | 20500 | July, 2008 |
ArrayExpress at EBI | Any curated MIAME or MINSEQE compliant transcriptomics data | 8037 | 239402 | January, 2009 |
UPenn RAD database | MIAME compliant public and private studies, associated with ArrayExpress | ~100 | ~2500 | Sept. 1, 2007 |
UNC Microarray database | ?? | ~31 | 2093 | April 1, 2007 |
UNC modENCODE Microarray database | Nimblegen customer 2.1 million array | ~6 | 180 | July 17, 2009 |
MUSC database | ?? | ~45 | 555 | April 1, 2007 |
caArray at NCI | Cancer data, prepared for analysis on caBIG | 41 | 1741 | November 15, 2006 |
UPSC-BASE | data generated by microarray analysis within Umeå Plant Science Centre (UPSC). | ~100 | ? | November 15, 2007 |
- For a directory of Microarray Databases, see: Template:Dmoz