Shapiro–Senapathy algorithm
Shapiro - Senapathy algorithm
Gene regulation is the main genetic program through which an organism controls its normal functions. Thus, any error in this program caused by mutations will alter the normal state and lead to disease. RNA splicing is increasingly realized to be at the center of gene regulation in eukaryotic organisms, including all animals and plants. In this context, Dr. Periannan Senapathy has pioneered research in the biology of RNA splicing, including understanding of why genes are split, what are splice junction sequences, and why exons are very short and introns are very long [1][2][3][4][5][6][7][8]. Based on these findings, he has provided an algorithm (known as Shapiro & Senapathy algorithm, S&S) for predicting the splice sites, exons and genes in animals and plants [9][10]. This algorithm has the ability to discover disease-causing mutations in splice junctions in cancerous and non-cancerous diseases that is being used in major research institutions around the world. The S&S algorithm has been cited in nearly 3,000 publications on finding splicing mutations in thousands of diseases including many different forms of cancer (10 example citations are given for cancers (Table 1) , non-cancers (Table 2) and plants (Tables 3) ).
It is becoming increasingly known that the pathology in majority of patients in any disease is caused by mutations in the splicing regions. Thus, applying the S&S technology platform in modern clinical genomics research will advance diagnosis and treatment of human diseases. In addition to its in thousands of studies involving a variety of diseases, it has been used in finding mutations in drug metabolizing genes that cause adverse reactions (Citations). S&S algorithm has also been used in many studies in agricultural plants and animals.
Using his split gene theory and S&S algorithm, Senapathy has developed analytical platforms for several genomes (REFs – ExDom etc).
Senapathy’s group has also developed several database resources dedicated to the analysis of split genes, splice junctions and mutations use S&S [11] [12] [13] [14]
As the mechanism of splicing is inherently complex, the identification of splicing mutations that cause disease is also difficult. The structure of the eukaryotic split genes is highly complex compared to the simple structure of bacterial genes. The reason for this difference is a major question in eukaryotic biology, as it involves how the extremely complex eukaryotic genes could have evolved from the simple genes of prokaryotes. Senapathy has formulated a theory based on his Random-sequence Origin of Split Genes model (ROSG) to explain why the genes of eukaryotes are split into short exon and long intron sequences
[15]
[16][17]
[18][19]
[20]
His research has shown that split genes can easily occur within random DNA sequence whereas contiguous genes of bacteria are extremely improbable to occur. These findings show that eukaryotic genes could have originated from prebiotic genetic sequences, and possibly gave rise to eukaryotic genomes. Senapathy has also shown that splice signal sequences that enable the spliceosome to recognize the splicing junctions originated from the stop-codon ends of Open Reading Frames (ORFs) in random sequence.
Studies in evolution of eukaryotic genes and genomes involves the origin of exons,introns and splice junctions, as all eukaryotic genes are split into many exons separated by introns, whereas prokaryotic genes are not. The exons are very short and introns are very long in large genomes such as the human (~3.2 billion bases). Genomes of many invertebrates are also very large such as that of sea urchin (~one billion bases), and contain many introns in their genes. However, the genomes of some animals and plants are relatively small such as those of sea squirt(Ciona Intestinalis – X bases) and Arabidopsis thaliana (~120 million bases). The genes in the genomes of these organisms are also split into exons and introns, albeit with short introns, and exhibit basically the same splice junction sequences. Senapathy’s findings answers many questions relating to the structure and evolution of these genomes, and the S&S algorithm is applied in research with numerous animal and plant genomes.
Table 1
Cancers Using Shapiro-Senapathy Algorithm
Table 2
Non-Cancer Diseases Using Shapiro-Senapathy Algorithm
Sl No | Publication Title | Citation |
---|---|---|
1 |
Mutations of the Microsomal Triglyceride-Transfer–Protein Gene in Abetalipoproteinemia |
Am. J. Hum. Genet. 57:1298-1310, 1995 |
2 |
Mutations in the Mu Heavy-Chain Gene in Patients with Agammaglobulinemia |
The New England Journal of Medicine. 3351486-1493; 1996 |
3 |
Human molecular genetics, 1998, vol 7, No 11, 1661 - 1669 | |
4 |
Splicing Defects in the Ataxia-Telangiectasia Gene, ATM: Underlying Mutations and Consequences |
Am. J. Hum. Genet. 64:1617–1631, 1999 |
5 | J. Clin. Invest. 103:649–652 (1999) | |
6 | J. Clin. Invest 100: 2204-2210 (1997) | |
7 | J Med Genet 1999;36:518–523 | |
8 | Proc. Natl. Acad. Sci. USA 86 1041-1045(1989) | |
9 |
Molecular Analysis of the SGLT2 Gene in Patients with Renal Glucosuria |
J Am Soc Nephrol 14: 2873–2882, 2003 |
10 | Am. J. Hum. Genet. 64:1524–1540, 1999 |
Table 3
Agricultural Plants and Animals Using Shapiro-Senapathy Algorithm
Sl No | Publication Title | Citation |
---|---|---|
1 | Mol Gen Genet (1990) 220:245-250 | |
2 | The Plant journal(1993)3(4)527-536 | |
3 |
Molecular characterization of aromatic peroxygenase from Agrocybe aegerita |
Appl Microbiol Biotechnol,April 2009 |
4 |
LRP7, a Gene Expressed in Lateral and Adventitious Root Primordia of Arabidopsis |
The Plant Cell, Vol. 7, 735-745, June 1995 |
5 | Nucleic Acids Research, 1998, Vol. 26, No. 20,4748–4757 | |
6 | Mol Gen Genet(1998) 259:511-515 | |
7 | Plant Physiology March 2007 vol. 143 no. 3 1110-1118 | |
8 |
lsolation of an Efficient Actin Promoter for Use in Rice Transformation |
The Plant Cell, Vol. 2, 163-171, February 1990 |
9 |
Isolation, characterization and expression of the maize Cat2 catalase gene |
Plant Molecular Biology 30:913-924(1996) |
10 |
Splicing signals in Drosophila: intron size, information content, and consensus sequences |
Nucleic Acids Research, Vol. 20, No. 16(4255-4262) |
Notes
- ^ P. Senapathy, Origin of eukaryotic introns: a hypothesis, based on codon distribution statistics in genes, and its implications. Proceedings of the National Academy of Sciences of the United States of America83, 2133-2137 (1986); [1] published online EpubApr
- ^ P. Senapathy, Possible evolution of splice-junction signals in eukaryotic genes from stop codons. [2]Proceedings of the National Academy of Sciences of the United States of America85, 1129-1133 (1988); published online EpubFeb
- ^ P.Senapathy, Introns and the origin of protein-coding genes. Science268, 1366-1367 (1995)[3]
- ^ P. Senapathy, Independent Birth of Organisms. (Genome Press, 1994)[4].
- ^ R. Regulapati, A. Bhasi, C. K. Singh, P. Senapathy, Origination of the split structure of spliceosomal genes from random genetic sequences. [5] PloS one3, e3456 (2008)10.1371/journal.pone.0003456).
- ^ M. B. Shapiro, P. Senapathy,[6] RNA splice junctions of different classes of eukaryotes: sequence statistics and functional implications in gene expression. Nucleic acids research15, 7155-7174 (1987); published online EpubSep 11
- ^ P. Senapathy, M. B. Shapiro, N. L. Harris, Splice junctions, branch point sites, and exons: sequence statistics, identification, and applications to genome project. [7] Methods in enzymology183, 252-278 (1990).
- ^ N. L. Harris, P. Senapathy, Distribution and consensus of branch point signals in eukaryotic genes: a computerized statistical analysis. Nucleic acids research18, 3015-3019 (1990); [8] published online EpubMay 25
- ^ M. B. Shapiro, P. Senapathy, RNA splice junctions of different classes of eukaryotes: sequence statistics and functional implications in gene expression. Nucleic acids research15, 7155-7174 (1987); published online EpubSep 11
- ^ P. Senapathy, M. B. Shapiro, N. L. Harris, Splice junctions, branch point sites, and exons: sequence statistics, identification, and applications to genome project. Methods in enzymology183, 252-278 (1990)
- ^ A. Bhasi, R. V. Pandey, S. P. Utharasamy, P. Senapathy, EuSplice: a unified resource for the analysis of splice signals and alternative splicing in eukaryotic genes. Bioinformatics23, 1815-1823 (2007);[9] published online EpubJul 15 (10.1093/bioinformatics/btm084).
- ^ A. Bhasi, P. Philip, V. Manikandan, P. Senapathy, ExDom: an integrated database for comparative analysis of the exon-intron structures of protein domains in eukaryotes. Nucleic acids research37, D703-711 (2009); [10] published online EpubJan (10.1093/nar/gkn746).
- ^ A. Bhasi, P. Philip, V. T. Sreedharan, P. Senapathy, AspAlt: A tool for inter-database, inter-genomic and user-specific comparative analysis of alternative transcription and alternative splicing in 46 eukaryotes. Genomics94, 48-54 (2009); [11] published online EpubJul (10.1016/j.ygeno.2009.02.006).
- ^ A. Bhasi et al., RoBuST: an integrated genomics resource for the root and bulb crop families Apiaceae and Alliaceae.[12] BMC plant biology10, 161 (2010)10.1186/1471-2229-10-161).
- ^ P. Senapathy, Origin of eukaryotic introns: a hypothesis, based on codon distribution statistics in genes, and its implications. Proceedings of the National Academy of Sciences of the United States of America83, 2133-2137 (1986); [13] published online EpubApr
- ^ P. Senapathy, Possible evolution of splice-junction signals in eukaryotic genes from stop codons. Proceedings of the National Academy of Sciences of the United States of America85, 1129-1133 (1988); published online EpubFeb
- ^ P. Senapathy, Introns and the origin of protein-coding genes. Science268, 1366-1367 (1995)
- ^ R. Regulapati, A. Bhasi, C. K. Singh, P. Senapathy, Origination of the split structure of spliceosomal genes from random genetic sequences. PloS one3, e3456 (2008)10.1371/journal.pone.0003456).
- ^ N. L. Harris, P. Senapathy, Distribution and consensus of branch point signals in eukaryotic genes: a computerized statistical analysis. Nucleic acids research18, 3015-3019 (1990); published online EpubMay 25
- ^ P. Senapathy, Distribution and repetition of sequence elements in eukaryotic DNA: New insights by computer-aided statistical analyses. [14] Molecular Genetics (Life Sci. Adv.)7, 53-65 (1988).