Bioinformatics discovery of non-coding RNAs

Non-coding RNAs have been discovered using both experimental and bioinformatic approaches. Bioinformatic approaches can be divided into three main categories. The first involves homology search, although these techniques are by definition unable to find new classes of ncRNAs. The second category includes algorithms designed to discover specific types of ncRNAs that have similar properties. Finally, some discovery methods are based on very general properties of RNA, and are thus able to discovery entirely new kinds of ncRNAs.

Discovery by homology search

Homology search refers to the process of searching a sequence database for RNAs that are similar to already known RNA sequences. Any algorithm that is designed for homology search of nucleic acid sequences can be used, e.g., BLAST^[1]. However, such algorithms typically are not as sensitive or accurate as algorithms specifically designed for RNA.

Of particular importance for RNA is its conservation of a secondary structure, which can be modeled to achieve additional accuracy in searches. For example, Covariance models^[2] can be viewed as an extension to a profile hidden Markov model that also reflects conserved secondary structure. Covariance models are implemented in the Infernal software package.^[3]

Discovery of specific types of ncRNAs

Some types of RNAs have shared properties that algorithms can exploit. For example, tRNAscan-SE^[4] is specialized to finding tRNAs. The heart of this program is a tRNA homology search based on covariance models, but other tRNA-specific search programs are used to accelerate searches.

The properties of snoRNAs have enabled the development of programs to detect new examples of snoRNAs, including those that might be only distantly related to previously known examples. Computer programs implementing such approaches include snoscan^[5] and snoReport.^[6]

Similarly, several algorithms have been developed to detect microRNAs. Examples include miRNAFold^[7] and miRNAminer^[8]

Discovery by general properties

Some properties are shared by multiple unrelated classes of ncRNA, and these properties can be targeted to discover new classes. Chief among them is the conservation of an RNA secondary structure. To measure conservation of secondary structure, it is necessary to somehow find homologous sequences that might exhibit a common structure. Strategies to do this have included the use of BLAST between two sequences ^[9] or multiple sequences,^[10] exploited synteny via orthologous genes^[11]^[12] or used locality sensitive hashing in combination with sequence and structural features^[13].

Mutations that change the nucleotide sequence, but preserve secondary structure are called covariation, and can provide evidence of conservation. Other statistics

References

^ Altschul SF, Madden TL, Schäffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ (September 1997). "Gapped BLAST and PSI-BLAST: a new generation of protein database search programs". Nucleic Acids Res. 25 (17): 3389–402. PMC 146917. PMID 9254694.
^ Eddy SR, Durbin R (June 1994). "RNA sequence analysis using covariance models". Nucleic Acids Res. 22 (11): 2079–88. PMC 308124. PMID 8029015.
^ Nawrocki EP, Eddy SR (November 2013). "Infernal 1.1: 100-fold faster RNA homology searches". Bioinformatics. 29 (22): 2933–5. doi:10.1093/bioinformatics/btt509. PMC 3810854. PMID 24008419.
^ Lowe TM, Eddy SR (March 1997). "tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence". Nucleic Acids Res. 25 (5): 955–64. PMC 146525. PMID 9023104.
^ Lowe TM, Eddy SR (February 1999). "A computational screen for methylation guide snoRNAs in yeast". Science. 283 (5405): 1168–71. PMID 10024243.
^ Hertel J, Hofacker IL, Stadler PF (January 2008). "SnoReport: computational identification of snoRNAs with unknown targets". Bioinformatics. 24 (2): 158–64. doi:10.1093/bioinformatics/btm464. PMID 17895272.
^ Tempel S, Tahi F (2012). "A fast ab-initio method for predicting miRNA precursors in genomes". Nucleic Acids Res. 40 (11): 955–64. doi:10.1093/nar/gks146. PMC 3367186. PMID 22362754.
^ Artzi S, Kiezun A, Shomron N (2008). "miRNAminer: a tool for homologous microRNA gene search". BMC Bioinformatics. 9 (1): 39. doi:10.1186/1471-2105-9-39. PMC 2258288. PMID 18215311.{{cite journal}}: CS1 maint: unflagged free DOI (link)
^ Rivas E, Eddy SR (2001). "Noncoding RNA gene detection using comparative sequence analysis". BMC Bioinformatics. 2: 8. PMC 64605. PMID 11801179.
^ 19340921
^ Weinberg Z, Barrick JE, Yao Z, Roth A, Kim JN, Gore J, Wang JX, Lee ER, Block KF, Sudarsan N, Neph S, Tompa M, Ruzzo WL, Breaker RR (2007). "Identification of 22 candidate structured RNAs in bacteria using the CMfinder comparative genomics pipeline". Nucleic Acids Res. 35 (14): 4809–19. doi:10.1093/nar/gkm487. PMC 1950547. PMID 17621584.
^ Hammond MC, Wachter A, Breaker RR (May 2009). "A plant 5S ribosomal RNA mimic regulates alternative splicing of transcription factor IIIA pre-mRNAs". Nat. Struct. Mol. Biol. 16 (5): 541–9. doi:10.1038/nsmb.1588. PMC 2680232. PMID 19377483.
^ Heyne S, Costa F, Rose D, Backofen R (June 2012). "GraphClust: alignment-free structural clustering of local RNA secondary structures". Bioinformatics. 28 (12): i224–32. doi:10.1093/bioinformatics/bts224. PMC 3371856. PMID 22689765.

v t e Bioinformatics
Databases	Sequence databases: GenBank, European Nucleotide Archive, DNA Data Bank of Japan and China National GeneBank Secondary databases: UniProt, database of protein sequences grouping together Swiss-Prot, TrEMBL and Protein Information Resource Other databases: BioNumbers, Protein Data Bank, Ensembl, InterPro, KEGG, and Gene Ontology Specialised genomic databases: BOLD, Saccharomyces Genome Database, FlyBase, VectorBase, WormBase, Rat Genome Database, PHI-base, Arabidopsis Information Resource, GISAID and Zebrafish Information Network
Software	BLAST Bowtie Clustal EMBOSS HMMER MUSCLE PANGOLIN SAMtools SOAP suite TopHat
Other	Server: ExPASy Rosalind (education platform)
Institutions	Broad Institute Computational Biology Department (CBD) Microsoft Research - University of Trento Centre for Computational and Systems Biology (COSBI) Database Center for Life Science (DBCLS) DNA Data Bank of Japan (DDBJ) European Bioinformatics Institute (EMBL-EBI) European Molecular Biology Laboratory (EMBL) Flatiron Institute J. Craig Venter Institute (JCVI) Max Planck Institute of Molecular Cell Biology and Genetics (MPI-CBG) US National Center for Biotechnology Information (NCBI) Japanese Institute of Genetics Netherlands Bioinformatics Centre (NBIC) Philippine Genome Center (PGC) Scripps Research Swiss Institute of Bioinformatics (SIB) Wellcome Sanger Institute Whitehead Institute
Organizations	African Society for Bioinformatics and Computational Biology (ASBCB) Australia Bioinformatics Resource (EMBL-AR) European Molecular Biology network (EMBnet) International Nucleotide Sequence Database Collaboration (INSDC) International Society for Biocuration (ISB) International Society for Computational Biology (ISCB) Student Council (ISCB-SC) Institute of Genomics and Integrative Biology (CSIR-IGIB) Japanese Society for Bioinformatics (JSBi)
Meetings	Basel Computational Biology Conference‎ ([BC²]) European Conference on Computational Biology (ECCB) Intelligent Systems for Molecular Biology (ISMB) International Conference on Bioinformatics (InCoB) International Conference on Computational Intelligence Methods for Bioinformatics and Biostatistics (CIBB) ISCB Africa ASBCB Conference on Bioinformatics Pacific Symposium on Biocomputing (PSB) Research in Computational Molecular Biology (RECOMB)
File formats	CRAM format FASTA format FASTQ format NeXML format Nexus format Pileup format SAM format Stockholm format VCF format GFF format GTF format
Related topics	Computational biology List of biobanks List of biological databases Molecular phylogenetics Sequencing Sequence database Sequence alignment
Category Commons

Discovery by homology search

Discovery of specific types of ncRNAs

Discovery by general properties

References

See also