Shapiro–Senapathy algorithm

Shapiro - Senapathy algorithm

Gene regulation is the main genetic program through which an organism controls its normal functions. Thus, any error in this program caused by mutations will alter the normal state and lead to disease. RNA splicing is increasingly realized to be at the center of gene regulation in eukaryotic organisms, including all animals and plants. In this context, Dr. Periannan Senapathy has pioneered research in the biology of RNA splicing, including understanding of why genes are split, what are splice junction sequences, and why exons are very short and introns are very long ^[1]^[2]^[3]^[4]^[5]^[6]^[7]^[8]. Based on these findings, he has provided an algorithm (known as Shapiro & Senapathy algorithm, S&S) for predicting the splice sites, exons and genes in animals and plants ^[9]^[10]. This algorithm has the ability to discover disease-causing mutations in splice junctions in cancerous and non-cancerous diseases that is being used in major research institutions around the world. The S&S algorithm has been cited in nearly 3,000 publications on finding splicing mutations in thousands of diseases including many different forms of cancer (10 example citations are given for cancers (Table 1) , non-cancers (Table 2) and plants (Tables 3) ).

It is becoming increasingly known that the pathology in majority of patients in any disease is caused by mutations in the splicing regions. Thus, applying the S&S technology platform in modern clinical genomics research will advance diagnosis and treatment of human diseases. In addition to its in thousands of studies involving a variety of diseases, it has been used in finding mutations in drug metabolizing genes that cause adverse reactions (Citations). S&S algorithm has also been used in many studies in agricultural plants and animals.

Using his split gene theory and S&S algorithm, Senapathy has developed analytical platforms for several genomes (REFs – ExDom etc).

Senapathy’s group has also developed several database resources dedicated to the analysis of split genes, splice junctions and mutations use S&S ^[11] ^[12] ^[13] ^[14]

As the mechanism of splicing is inherently complex, the identification of splicing mutations that cause disease is also difficult. The structure of the eukaryotic split genes is highly complex compared to the simple structure of bacterial genes. The reason for this difference is a major question in eukaryotic biology, as it involves how the extremely complex eukaryotic genes could have evolved from the simple genes of prokaryotes. Senapathy has formulated a theory based on his Random-sequence Origin of Split Genes model (ROSG) to explain why the genes of eukaryotes are split into short exon and long intron sequences ^[15] ^[16]^[17] ^[18]^[19] ^[20]

His research has shown that split genes can easily occur within random DNA sequence whereas contiguous genes of bacteria are extremely improbable to occur. These findings show that eukaryotic genes could have originated from prebiotic genetic sequences, and possibly gave rise to eukaryotic genomes. Senapathy has also shown that splice signal sequences that enable the spliceosome to recognize the splicing junctions originated from the stop-codon ends of Open Reading Frames (ORFs) in random sequence.

Studies in evolution of eukaryotic genes and genomes involves the origin of exons,introns and splice junctions, as all eukaryotic genes are split into many exons separated by introns, whereas prokaryotic genes are not. The exons are very short and introns are very long in large genomes such as the human (~3.2 billion bases). Genomes of many invertebrates are also very large such as that of sea urchin (~one billion bases), and contain many introns in their genes. However, the genomes of some animals and plants are relatively small such as those of sea squirt(Ciona Intestinalis – X bases) and Arabidopsis thaliana (~120 million bases). The genes in the genomes of these organisms are also split into exons and introns, albeit with short introns, and exhibit basically the same splice junction sequences. Senapathy’s findings answers many questions relating to the structure and evolution of these genomes, and the S&S algorithm is applied in research with numerous animal and plant genomes.

Table 1
Cancers Using Shapiro-Senapathy Algorithm

Sl No	Publication Title	Citation
1	Interaction of H-2Eb with an IAP retrotransposon in the A20/2J B cell lymphoma	Mol Gen Genet (1990) 220:245-250
2	UV Fingerprints Predominate in the PTCH Mutation Spectra of Basal Cell Carcinomas Independent of Clinical Phenotype	Journal of Investigative Dermatology (2007), Volume 127; 2872-2881
3	Genetic variants and haplotype analyses of the ZBRK1/ZNF350 gene in high-risk non BRCA1/2 french Canadian breast and ovarian cancer families	Int. J. Cancer: 122, 108–116 (2008)
4	Aberrant RNA splicing in the hMSH2 gene: Molecular identification of three aberrant RNA in Scottish patients with colorectal cancer in the West of Scotland	American Journal of Medical Genetics 95:49–52 (2000)
5	Mutational analysis of TSC1 and TSC2 genes in gangliogliomas	Neurobiology (2001), 27, 105±114
6	Characterization of TRIM31, upregulated in gastric adenocarcinoma, as a novel RBCC protein	Characterization of TRIM31, upregulated in gastric adenocarcinoma, as a novel RBCC protein
7	Analysis of TGF-β type I receptor for mutations and polymorphisms in head and neck cancers	Mutation Research 479 (2001) 131–139
8	Ten novel MSH2 and MLH1 germline mutations in families with HNPCC	Hum Mutat. 2004 Oct; 24(4):351-2.
9	DNA rearrangement of a homeobox gene in myeloid leukaemic cells.	The EMBO Journal vol.7 no.13 pp.4283-4290, 1988
10	A deep intronic mutation in CDKN2A is associated with disease in a subset of melanoma pedigrees	Human Molecular Genetics, 2001, Vol. 10, No. 23 2679–2686

Table 2
Non-Cancer Diseases Using Shapiro-Senapathy Algorithm

Sl No	Publication Title	Citation
1	Mutations of the Microsomal Triglyceride-Transfer–Protein Gene in Abetalipoproteinemia	Am. J. Hum. Genet. 57:1298-1310, 1995
2	Mutations in the Mu Heavy-Chain Gene in Patients with Agammaglobulinemia	The New England Journal of Medicine. 3351486-1493; 1996
3	The Anhidrotic Ectodermal Dysplasia Gene (EDA) Undergoes Alternative Splicing and Encodes Ectodysplasin-A with Deletion Mutations in Collagenous Repeats	Human molecular genetics, 1998, vol 7, No 11, 1661 - 1669
4	Splicing Defects in the Ataxia-Telangiectasia Gene, ATM: Underlying Mutations and Consequences	Am. J. Hum. Genet. 64:1617–1631, 1999
5	Splice acceptor site mutation of the transporter associated with antigen processing-1 gene in human bare lymphocyte syndrome	J. Clin. Invest. 103:649–652 (1999)
6	Disruption of the splicing enhancer sequence within exon 27 of the dystrophin gene by a nonsense mutation induces partial skipping of the exon and is responsible for Becker muscular dystrophy.	J. Clin. Invest 100: 2204-2210 (1997)
7	Analysis of germline CDKN1C (p57KIP2) mutations in familial andsporadic Beckwith-Wiedemann syndrome (BWS) provides a novel genotype-phenotype correlation	J Med Genet 1999;36:518–523
8	A 5' splice-region G----C mutation in exon 1 of the human beta-globin gene inhibits pre-mRNA splicing: a mechanism for beta+-thalassemia	Proc. Natl. Acad. Sci. USA 86 1041-1045(1989)
9	Molecular Analysis of the SGLT2 Gene in Patients with Renal Glucosuria	J Am Soc Nephrol 14: 2873–2882, 2003
10	Calpainopathy—A Survey of Mutations and Polymorphisms	Am. J. Hum. Genet. 64:1524–1540, 1999

Table 3
Agricultural Plants and Animals Using Shapiro-Senapathy Algorithm

Sl No	Publication Title	Citation
1	Construction of an intron-containing marker gene:Splicing of the intron in transgenic plants and its use in monitoring early events in Agrobacterium-mediated plant transformation	Mol Gen Genet (1990) 220:245-250
2	Characterization of the catalase anti-oxidant defence gene Cat1 of maize,and its developmentally regulated expression in transgenic tobacco	The Plant journal(1993)3(4)527-536
3	Molecular characterization of aromatic peroxygenase from Agrocybe aegerita	Appl Microbiol Biotechnol,April 2009
4	LRP7, a Gene Expressed in Lateral and Adventitious Root Primordia of Arabidopsis	The Plant Cell, Vol. 7, 735-745, June 1995
5	Prediction of locally optimal splice sites in plant pre-mRNA with applications to gene identification in Arabidopsis thaliana genomic DNA	Nucleic Acids Research, 1998, Vol. 26, No. 20,4748–4757
6	A tobacco gene encoding a novel basic class II chitinease:A Putative ancestor of basic class I and acidic class II chitinase genes	Mol Gen Genet(1998) 259:511-515
7	Phytochelatin Synthases of the Model Legume Lotus japonicus. A Small Multigene Family with Differential Response to Cadmium and Alternatively Spliced Variants	Plant Physiology March 2007 vol. 143 no. 3 1110-1118
8	lsolation of an Efficient Actin Promoter for Use in Rice Transformation	The Plant Cell, Vol. 2, 163-171, February 1990
9	Isolation, characterization and expression of the maize Cat2 catalase gene	Plant Molecular Biology 30:913-924(1996)
10	Splicing signals in Drosophila: intron size, information content, and consensus sequences	Nucleic Acids Research, Vol. 20, No. 16(4255-4262)

Notes

^ P. Senapathy, Origin of eukaryotic introns: a hypothesis, based on codon distribution statistics in genes, and its implications. Proceedings of the National Academy of Sciences of the United States of America83, 2133-2137 (1986); [1] published online EpubApr
^ P. Senapathy, Possible evolution of splice-junction signals in eukaryotic genes from stop codons. [2]Proceedings of the National Academy of Sciences of the United States of America85, 1129-1133 (1988); published online EpubFeb
^ P.Senapathy, Introns and the origin of protein-coding genes. Science268, 1366-1367 (1995)[3]
^ P. Senapathy, Independent Birth of Organisms. (Genome Press, 1994)[4].
^ R. Regulapati, A. Bhasi, C. K. Singh, P. Senapathy, Origination of the split structure of spliceosomal genes from random genetic sequences. [5] PloS one3, e3456 (2008)10.1371/journal.pone.0003456).
^ M. B. Shapiro, P. Senapathy,[6] RNA splice junctions of different classes of eukaryotes: sequence statistics and functional implications in gene expression. Nucleic acids research15, 7155-7174 (1987); published online EpubSep 11
^ P. Senapathy, M. B. Shapiro, N. L. Harris, Splice junctions, branch point sites, and exons: sequence statistics, identification, and applications to genome project. [7] Methods in enzymology183, 252-278 (1990).
^ N. L. Harris, P. Senapathy, Distribution and consensus of branch point signals in eukaryotic genes: a computerized statistical analysis. Nucleic acids research18, 3015-3019 (1990); [8] published online EpubMay 25
^ M. B. Shapiro, P. Senapathy, RNA splice junctions of different classes of eukaryotes: sequence statistics and functional implications in gene expression. Nucleic acids research15, 7155-7174 (1987); published online EpubSep 11
^ P. Senapathy, M. B. Shapiro, N. L. Harris, Splice junctions, branch point sites, and exons: sequence statistics, identification, and applications to genome project. Methods in enzymology183, 252-278 (1990)
^ A. Bhasi, R. V. Pandey, S. P. Utharasamy, P. Senapathy, EuSplice: a unified resource for the analysis of splice signals and alternative splicing in eukaryotic genes. Bioinformatics23, 1815-1823 (2007);[9] published online EpubJul 15 (10.1093/bioinformatics/btm084).
^ A. Bhasi, P. Philip, V. Manikandan, P. Senapathy, ExDom: an integrated database for comparative analysis of the exon-intron structures of protein domains in eukaryotes. Nucleic acids research37, D703-711 (2009); [10] published online EpubJan (10.1093/nar/gkn746).
^ A. Bhasi, P. Philip, V. T. Sreedharan, P. Senapathy, AspAlt: A tool for inter-database, inter-genomic and user-specific comparative analysis of alternative transcription and alternative splicing in 46 eukaryotes. Genomics94, 48-54 (2009); [11] published online EpubJul (10.1016/j.ygeno.2009.02.006).
^ A. Bhasi et al., RoBuST: an integrated genomics resource for the root and bulb crop families Apiaceae and Alliaceae.[12] BMC plant biology10, 161 (2010)10.1186/1471-2229-10-161).
^ P. Senapathy, Origin of eukaryotic introns: a hypothesis, based on codon distribution statistics in genes, and its implications. Proceedings of the National Academy of Sciences of the United States of America83, 2133-2137 (1986); [13] published online EpubApr
^ P. Senapathy, Possible evolution of splice-junction signals in eukaryotic genes from stop codons. Proceedings of the National Academy of Sciences of the United States of America85, 1129-1133 (1988); published online EpubFeb
^ P. Senapathy, Introns and the origin of protein-coding genes. Science268, 1366-1367 (1995)
^ R. Regulapati, A. Bhasi, C. K. Singh, P. Senapathy, Origination of the split structure of spliceosomal genes from random genetic sequences. PloS one3, e3456 (2008)10.1371/journal.pone.0003456).
^ N. L. Harris, P. Senapathy, Distribution and consensus of branch point signals in eukaryotic genes: a computerized statistical analysis. Nucleic acids research18, 3015-3019 (1990); published online EpubMay 25
^ P. Senapathy, Distribution and repetition of sequence elements in eukaryotic DNA: New insights by computer-aided statistical analyses. [14] Molecular Genetics (Life Sci. Adv.)7, 53-65 (1988).

[1] P. Senapathy, Origin of eukaryotic introns: a hypothesis, based on codon distribution statistics in genes, and its implications. Proceedings of the National Academy of Sciences of the United States of America83, 2133-2137 (1986); [1] published online EpubApr

[2] P. Senapathy, Possible evolution of splice-junction signals in eukaryotic genes from stop codons. [2]Proceedings of the National Academy of Sciences of the United States of America85, 1129-1133 (1988); published online EpubFeb

[3] P.Senapathy, Introns and the origin of protein-coding genes. Science268, 1366-1367 (1995)[3]

[4] P. Senapathy, Independent Birth of Organisms. (Genome Press, 1994)[4].

[5] R. Regulapati, A. Bhasi, C. K. Singh, P. Senapathy, Origination of the split structure of spliceosomal genes from random genetic sequences. [5] PloS one3, e3456 (2008)10.1371/journal.pone.0003456).

[6] M. B. Shapiro, P. Senapathy,[6] RNA splice junctions of different classes of eukaryotes: sequence statistics and functional implications in gene expression. Nucleic acids research15, 7155-7174 (1987); published online EpubSep 11

[7] P. Senapathy, M. B. Shapiro, N. L. Harris, Splice junctions, branch point sites, and exons: sequence statistics, identification, and applications to genome project. [7] Methods in enzymology183, 252-278 (1990).

[8] N. L. Harris, P. Senapathy, Distribution and consensus of branch point signals in eukaryotic genes: a computerized statistical analysis. Nucleic acids research18, 3015-3019 (1990); [8] published online EpubMay 25

[9] M. B. Shapiro, P. Senapathy, RNA splice junctions of different classes of eukaryotes: sequence statistics and functional implications in gene expression. Nucleic acids research15, 7155-7174 (1987); published online EpubSep 11

[10] P. Senapathy, M. B. Shapiro, N. L. Harris, Splice junctions, branch point sites, and exons: sequence statistics, identification, and applications to genome project. Methods in enzymology183, 252-278 (1990)

[11] A. Bhasi, R. V. Pandey, S. P. Utharasamy, P. Senapathy, EuSplice: a unified resource for the analysis of splice signals and alternative splicing in eukaryotic genes. Bioinformatics23, 1815-1823 (2007);[9] published online EpubJul 15 (10.1093/bioinformatics/btm084).

[12] A. Bhasi, P. Philip, V. Manikandan, P. Senapathy, ExDom: an integrated database for comparative analysis of the exon-intron structures of protein domains in eukaryotes. Nucleic acids research37, D703-711 (2009); [10] published online EpubJan (10.1093/nar/gkn746).

[13] A. Bhasi, P. Philip, V. T. Sreedharan, P. Senapathy, AspAlt: A tool for inter-database, inter-genomic and user-specific comparative analysis of alternative transcription and alternative splicing in 46 eukaryotes. Genomics94, 48-54 (2009); [11] published online EpubJul (10.1016/j.ygeno.2009.02.006).

[14] A. Bhasi et al., RoBuST: an integrated genomics resource for the root and bulb crop families Apiaceae and Alliaceae.[12] BMC plant biology10, 161 (2010)10.1186/1471-2229-10-161).

[15] P. Senapathy, Origin of eukaryotic introns: a hypothesis, based on codon distribution statistics in genes, and its implications. Proceedings of the National Academy of Sciences of the United States of America83, 2133-2137 (1986); [13] published online EpubApr

[16] P. Senapathy, Possible evolution of splice-junction signals in eukaryotic genes from stop codons. Proceedings of the National Academy of Sciences of the United States of America85, 1129-1133 (1988); published online EpubFeb

[17] P. Senapathy, Introns and the origin of protein-coding genes. Science268, 1366-1367 (1995)

[18] R. Regulapati, A. Bhasi, C. K. Singh, P. Senapathy, Origination of the split structure of spliceosomal genes from random genetic sequences. PloS one3, e3456 (2008)10.1371/journal.pone.0003456).

[19] N. L. Harris, P. Senapathy, Distribution and consensus of branch point signals in eukaryotic genes: a computerized statistical analysis. Nucleic acids research18, 3015-3019 (1990); published online EpubMay 25

[20] P. Senapathy, Distribution and repetition of sequence elements in eukaryotic DNA: New insights by computer-aided statistical analyses. [14] Molecular Genetics (Life Sci. Adv.)7, 53-65 (1988).

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[18]

[19]

[20]

Table 1 Cancers Using Shapiro-Senapathy Algorithm

Table 2 Non-Cancer Diseases Using Shapiro-Senapathy Algorithm

Table 3 Agricultural Plants and Animals Using Shapiro-Senapathy Algorithm

Notes

Table 1
Cancers Using Shapiro-Senapathy Algorithm

Table 2
Non-Cancer Diseases Using Shapiro-Senapathy Algorithm

Table 3
Agricultural Plants and Animals Using Shapiro-Senapathy Algorithm