Jump to content

Critical Assessment of Function Annotation

From Wikipedia, the free encyclopedia
This is an old revision of this page, as edited by Kapagel (talk | contribs) at 18:54, 23 September 2013 (Motivation/History/Organization sections added). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

The Critical Assessment of Functional Annotation (CAFA) is an experiment designed to provide a large-scale assessment of computational methods dedicated to predicting protein function.[1] Different algorithms are evaluated by their ability to predict the Gene Ontology (GO) terms in the categories of Molecular Function, Biological Process, and Cellular Component.

The experiment consists of two tracks: (i) the eukaryotic track, (ii) the prokaryotic track. In each track, a set of targets is provided by the organizers. Participants are expected to submit their predictions by the submission deadline, after which they are assessed according to a set of specific metrics.

Motivation

The genome of an organism may consist of hundreds to tens of thousands of genes, which encode for hundreds of thousands of different protein sequences. Due to the relatively low cost of genome sequencing, determining gene and protein sequences is fast and inexpensive. Thousands of species have been sequenced so far.[2] Determining what a protein does in a cell, on the other hand, is time consuming and expensive. Even when functional assays are performed they are unlikely to provide complete insight into protein function, Therefore it has become important to use computational tools in order to functionally annotate proteins. In short, given proteins with known function, computational function predictors need to infer functions of all the remaining proteins.

The CAFA experiment is designed to provide unbiased assessment of computational methods, to stimulate research in computational function prediction, and provide insights into the overall state-of-the-art in function prediction.

Organization

The experiment consists of three phases:

  1. Prediction phase: ~4 months

    Organizers provide protein sequences with unknown or incomplete function to community and set the deadline for the submission of predictions

  2. Target accumulation: 6-12 months

    After all predictions are stored and the experiment enters a waiting period in which protein functions are expected to accumulate in public databases

  3. Analysis Phase: 1 month

    Predictors are ranked according to their performance.


The results are publicly shared in scientific meetings and published after peer review.

History

The CAFA experiment is conducted by the Automated Function Prediction (AFP) Special Interest Group (AFP/SIG). An AFP/SIG meeting has been held alongside the Intelligent Systems for Molecular Biology conference in 2005, 2006, 2008, 2011, and 2012. The first CAFA experiment was organized between fall 2010 and spring 2012. The organizers provided 48,000 sequences for the community with the task to prediction Gene Ontology annotations for each of these sequences. Of those 48,000 proteins, 866 were experimentally annotated during target accumulation phase. The results showed that current function prediction algorithms perform significantly better than a simple domain assignment or a straightforward use of BLAST package. However, they also revealed that accurate prediction of a protein's biological function is still an open and challenging problem.

References

  1. ^ Predrag, Radivojac (2013). Nature Methods. 10: 221–227. {{cite journal}}: Missing or empty |title= (help); Unknown parameter |coauthors= ignored (|author= suggested) (help)
  2. ^ Bernal, Axel (2001). Nucleic Acids Research. 29.1: 126–127. {{cite journal}}: Missing or empty |title= (help); Unknown parameter |coauthors= ignored (|author= suggested) (help)


References