Jump to content

Dot plot (bioinformatics)

From Wikipedia, the free encyclopedia
This is an old revision of this page, as edited by 112.10.182.66 (talk) at 05:51, 3 May 2018 (Interpretation). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.
A DNA dot plot of a human zinc finger transcription factor (GenBank ID NM_002383), showing regional self-similarity. The main diagonal represents the sequence's alignment with itself; lines off the main diagonal represent similar or repetitive patterns within the sequence.

In bioinformatics a dot plot is a graphical method that allows the comparison of two biological sequences and identify regions of close similarity between them. It is a type of recurrence plot.

History

One way to visualize the similarity between two protein or nucleic acid sequences is to use a similarity matrix, known as a dot plot. These were introduced by Gibbs and McIntyre in 1970[1] and are two-dimensional matrices that have the sequences of the proteins being compared along the vertical and horizontal axes. For a simple visual representation of the similarity between two sequences, individual cells in the matrix can be shaded black if residues are identical, so that matching sequence segments appear as runs of diagonal lines across the matrix.

>CY003854.1 Influenza A virus (A/mallard/Alberta/77/1977(H2N3)) segment 1, complete sequence AGCGAAAGCAGGTCAAATATATTCAATATGGAGAGAATAAAAGAACTAAGAGATCTAATGTCACAGTCCC GCACCCGCGAGATACTCACCAAAACCACTGTGGACCACATGGCCATAATCAAAAAATACACATCAGGAAG GCAAGAGAAGAACCCCGCACTCAGGATGAAGTGGATGATGGCAATGAAATATCCAATTACTGCAGATAAG AGAATAATGGAAATGATTCCTGAAAGGAATGAACAAGGACAAACCCTCTGGAGCAAAACAAACGATGCCG GCTCAGACCGAGTGATGGTATCACCTCTGGCCGTGACATGGTGGAATAGGAATGGACCAACAACAAGTAC AGTTCACTACCCAAAGGTATATAAAACTTATTTCGAAAAAGTCGAAAGGTTGAAACACGGGACCTTTGGC CCCGTCCACTTCAGAAATCAAGTTAAGATAAGACGGAGGGTTGACATAAACCCTGGCCACGCAGACCTCA GTGCCAAAGAGGCACAGGATGTAATCATGGAAGTTGTTTTCCCAAATGAAGTGGGAGCTAGAATACTAAC ATCGGAGTCACAACTGACAATAACAAAAGAGAAAAAGGAAGAACTCCAGGACTGTAAAATTGCCCCCTTG ATGGTAGCATACATGCTAGAAAGAGAGTTGGTCCGCAAAACGAGGTTCCTCCCAGTGGCTGGTGGAACAA GCAGTGTCTATATTGAGGTGTTGCATTTAACCCAGGGGACATGCTGGGAGCAGATGTACACTCCAGGAGG GGAAGTGAGAAATGATGATGTTGACCAAAGCTTGATTATCGCTGCCAGGAACATAGTAAGAAGAGCAACG GTATCAGCAGACCCACTAGCATCTCTATTGGAGATGTGCCACAGCACACAGATTGGGGGAATAAGGATGG TAGACATCCTTCGGCAAAATCCAACAGAGGAACAAGCCGTGGACATATGCAAGGCAGCAATGGGCTTGAG GATTAGCTCATCTTTCAGCTTTGGTGGATTCACTTTCAAAAGAACAAGCGGGTCGTCAGTTAAGAGAGAA GAAGAAGTGCTTACGGGCAACCTTCAAACATTGAAAATAAGAGTACATGAGGGGTATGAAGAGTTCACAA TGGTTGGGAGAAGAGCAACAGCTATTCTAAGAAAGGCAACCAGGAGATTGATCCAGCTAATAGTAAGTGG GAGAGACGAGCAGTCAATTGCTGAAGCAATAATTGTGGCCATGGTATTTTCACAAGAGGATTGCATGATC AAGGCAGTTCGGGGTGATCTGAACTTTGTCAATAGGGCAAATCAGCGACTGAACCCCATGCATCAACTCT TGAGACACTTCCAAAAGGATGCAAAAGTGCTTTTCCAAAACTGGGGAATTGAACCCATTGACAATGTGAT GGGAATGATCGGAATATTGCCCGACATGACCCCAAGTACTGAGATGTCGCTGAGGGGGATAAGAGTCAGC AAAATGGGAGTAGATGAATACTCCAGCACAGAAAGGGTGGTGGTGAGCATTGACCGATTTTTAAGGGTTC GGGATCAACGGGGAAACGTACTATTGTCACCCGAAGAAGTTAGCGAGACACAAGGAACGGAGAAACTGAC AATAACTTATTCGTCATCAATGATGTGGGAGATCAATGGTCCTGAGTCGGTGTTGGTCAATACTTATCAA TGGATCATCAGGAACTGGGAGACTGTGAAAATTCAATGGTCACAGGATCCCACAATGTTATATAATAAGA TGGAATTCGAGCCATTTCAGTCTCTGGTCCCTAAGGCAGCCAGAGGTCAATACAGCGGATTCGTGAGGAC ACTGTTCCAGCAGATGCGGGATGTGCTTGGAACATTTGACACTGTTCAGATAATAAAACTTCTTCCCTTT GCTGCTGCTCCACCAGAACAGAGTAGGATGCAGTTCTCCTCCCTGACTGTGAATGTGAGAGGATCAGGAA TGAGGATACTGGTAAGAGGCAATTCTCCAGTGTTCAATTACAACAAGGCCACCAAGAGGCTTACAGTCCT TGGAAAAGATGCAGGTGCATTGACCGAAGATCCAGATGAAGGCACAGCTGGAGTGGAGTCTGCTGTTCTA AGAGGATTCCTCATTTTGGGCAAAGAAGACAAGAGATATGGCCCAGCATTAAGCATCAATGAGCTGAGCA ATCTTGCAAAAGGAGAGAAGGCTAATGTGCTAATTGGGCAAGGAGACGTGGTGTTGGTAATGAAACGGAA ACGGGACTCTAGCATACTTACTGACAGCCAGACAGCGACCAAAAGAATTCGGATGGCCATCAATTAGTGT CGAATTGTTTAAAAACGACCTTGTTTCTACT


>CY003886.1 Influenza A virus (A/mallard duck/ALB/376/1985(H2N3)) segment 1, complete sequence AGCGAAAGCAGGTCAAATATATTCAATATGGAGAGAATAAAAGAACTAAGAGATCTAATGTCACAGTCCC GCACTCGCGAGATACTCACCAAAACCACTGTGGACCATATGGCCATAATCAAAAAATACACATCAGGAAG GCAAGAGAAGAATCCCGCACTCAGGATGAAATGGATGATGGCAATGAAATATCCAATTACAGCGGATAAG AGGATAATGGAGATGATTCCCGAGAGGAATGAACAAGGGCAAACCCTCTGGAGCAAAACAAATGATGCCG GCTCAGACCGAGTGATGGTATCACCTCTGGCTGTGACATGGTGGAATAGGAATGGACCAACAACAAGTAC AATTCACTACCCAAAGGTATATAAAACCTATTTCGAAAAGGTCGAAAGGTTAAAACATGGGACCTTTGGC CCCGTTCACTTCAGGAATCAAGTTAAGATAAGACGGAGAGTTGACATAAACCCTGGACATGCAGACCTCA GTGCCAAAGAGGCACAGGATGTAATCATGGAAGTTGTTTTCCCAAATGAAGTGGGGGCCAGGATATTAAC ATCGGAGTCACAGCTGACAATAACAAAAGAGAAAAAGGAAGAACTCCAAGATTGTAAAATTGCCCCCTTG ATGGTAGCATACATGCTAGAAAGAGAGTTAGTCCGCAAAACGAGGTTCCTCCCAGTGGCTGGTGGAACAA GCAGTGTTTATATTGAGGTGTTGCATTTGACCCAGGGAACATGCTGGGAACAAATGTACACTCCAGGAGG GGAAGTGAGAAATGATGATGTTGACCAAAGCTTAATTATCGCTGCCAGGAATATAGTAAGAAGAGCAACG GTATCAGCAGACCCACTAGCGTCTCTATTGGAGATGTGCCACAGCACACAGATTGGTGGAATAAGGATGG TAGACATCCTTAGGCAGAATCCAACAGAGGAACAAGCCGTGGATATATGCAAGGCGGCAATGGGCTTGAG GATTAGCTCATCTTTCAGCTTCGGTGGATTCACTTTTAAAAGAACAAGTGGGTCGTCAGTCAAAAGAGAA GAAGAAGTGCTTACGGGCAACCTTCAAACACTGAAAATAAGAGTGCATGAGGGGTATGAAGAATTCACAA TGGTTGGGAGAAGAGCAACAGCTATTCTCAGGAAGGCAACCAGGAGATTGATTCAGCTAATAGTCAGTGG GAGAGATGAACAGTCAATTGCTGAAGCAATAATTGTAGCTATGGTATTTTCACAAGAGGATTGCATGATC AAGGCAGTTCGGGGTGATCTGAACTTTGTCAATAGAGCAAACCAGCGACTGAACCCCATGCATCAACTCT TGAGACATTTCCAAAAGGATGCAAAAGTGCTTTTCCAAAATTGGGGAATTGAACCCATTGACAATGTGAT GGGAATGATCGGAATACTACCCGACATGACCCCAAGTACTGAGACGTCATTGAGAGGGATAAGAGTCAGC AAAATGGGAGTGGATGAATACTCCAGCACAGAGAGAGTGGTGGTGAGCATTGACCGTTTTTTAAGGGTTC GGGATCAACGGGGAAACGTACTATTGTCACCTGAAGAAGTCAGCGAGACGCAAGGGACGGAAAAGTTGAC AATAACTTACTCATCATCAATGATGTGGGAGATCAATGGTCCTGAATCAGTGTTGGTCAATACTTACCAG TGGATCATCAGAAACTGGGAGACTGTGAAAATTCAATGGTCACAGGATCCCACAATGTTGTACAATAAGA TGGAATTCGAGCCATTTCAGTCTCTGGTCCCTAAGGCAGCTAGAGGTCAATACAGCGGATTCGTGAGGAC GCTGTTCCAACAAATGCGGGATGTGCTTGGAACATTTGACACTGTTCAGATAATAAAACTTCTCCCCTTT GCTGCTGCCCCACCAGAACAGAGTAGGATGCAGTTCTCCTCCTTGACTGTGAATGTAAGAGGATCAGGAA TGAGGATACTGGTAAGAGGCAACTCTCCAGTGTTCAATTACAACAAGGCCACCAAGAGGCTTACAGTCCT CGGGAAGGATGCAGGTGCATTAACTGAAGACCCAGATGAAGGCACAGCTGGAGTGGAATCTGCTGTTCTG AGAGGATTCCTCATTTTGGGCAAAGAAGACAAGAGATATGGCCCAGCATTGAGCATCAATGAGCTGAGCA ATCTTGCAAAAGGAGAGAAGGCTAATGTGCTAATTGGGCAAGGAGACGTGGTGTTGGTAATGAAACGGAA ACGGGACTCTAGCATACTTACTGACAGCCAGACAGCGACCAAAAGGATTCGGATGGCCATCAATTAGTGT CGAATTGTTTAAAAACGACCTTGTTTCTACT

See also

References

  1. ^ Gibbs, Adrian J.; McIntyre, George A. (1970). "The Diagram, a Method for Comparing Sequences. Its Use with Amino Acid and Nucleotide Sequences". Eur. J. Biochem. 16: 1–11. doi:10.1111/j.1432-1033.1970.tb01046.x.

Software to create plots

  1. ^ Krumsiek, J.; Arnold, R.; Rattei, T. (2007-04-15). "Gepard: a rapid and sensitive tool for creating dotplots on genome scale". Bioinformatics. 23 (8): 1026–1028. doi:10.1093/bioinformatics/btm039. ISSN 1367-4803.
  2. ^ Sonnhammer, E. L.; Durbin, R. (1995-12-29). "A dot-matrix program with dynamic threshold control suited for genomic DNA and protein sequence analysis". Gene. 167 (1–2): GC1–10. ISSN 0378-1119. PMID 8566757.
  3. ^ Brodie, R.; Roper, R. L.; Upton, C. (2004-01-22). "JDotter: a Java interface to multiple dotplots generated by dotter". Bioinformatics. 20 (2): 279–281. doi:10.1093/bioinformatics/btg406. ISSN 1367-4803.