Dot plot (bioinformatics)

In bioinformatics a dot plot is a graphical method that allows the comparison of two biological sequences and identify regions of close similarity between them. It is a type of recurrence plot.
History
One way to visualize the similarity between two protein or nucleic acid sequences is to use a similarity matrix, known as a dot plot. These were introduced by Gibbs and McIntyre in 1970[1] and are two-dimensional matrices that have the sequences of the proteins being compared along the vertical and horizontal axes. For a simple visual representation of the similarity between two sequences, individual cells in the matrix can be shaded black if residues are identical, so that matching sequence segments appear as runs of diagonal lines across the matrix.
>CY003854.1 Influenza A virus (A/mallard/Alberta/77/1977(H2N3)) segment 1, complete sequence AGCGAAAGCAGGTCAAATATATTCAATATGGAGAGAATAAAAGAACTAAGAGATCTAATGTCACAGTCCC GCACCCGCGAGATACTCACCAAAACCACTGTGGACCACATGGCCATAATCAAAAAATACACATCAGGAAG GCAAGAGAAGAACCCCGCACTCAGGATGAAGTGGATGATGGCAATGAAATATCCAATTACTGCAGATAAG AGAATAATGGAAATGATTCCTGAAAGGAATGAACAAGGACAAACCCTCTGGAGCAAAACAAACGATGCCG GCTCAGACCGAGTGATGGTATCACCTCTGGCCGTGACATGGTGGAATAGGAATGGACCAACAACAAGTAC AGTTCACTACCCAAAGGTATATAAAACTTATTTCGAAAAAGTCGAAAGGTTGAAACACGGGACCTTTGGC CCCGTCCACTTCAGAAATCAAGTTAAGATAAGACGGAGGGTTGACATAAACCCTGGCCACGCAGACCTCA GTGCCAAAGAGGCACAGGATGTAATCATGGAAGTTGTTTTCCCAAATGAAGTGGGAGCTAGAATACTAAC ATCGGAGTCACAACTGACAATAACAAAAGAGAAAAAGGAAGAACTCCAGGACTGTAAAATTGCCCCCTTG ATGGTAGCATACATGCTAGAAAGAGAGTTGGTCCGCAAAACGAGGTTCCTCCCAGTGGCTGGTGGAACAA GCAGTGTCTATATTGAGGTGTTGCATTTAACCCAGGGGACATGCTGGGAGCAGATGTACACTCCAGGAGG GGAAGTGAGAAATGATGATGTTGACCAAAGCTTGATTATCGCTGCCAGGAACATAGTAAGAAGAGCAACG GTATCAGCAGACCCACTAGCATCTCTATTGGAGATGTGCCACAGCACACAGATTGGGGGAATAAGGATGG TAGACATCCTTCGGCAAAATCCAACAGAGGAACAAGCCGTGGACATATGCAAGGCAGCAATGGGCTTGAG GATTAGCTCATCTTTCAGCTTTGGTGGATTCACTTTCAAAAGAACAAGCGGGTCGTCAGTTAAGAGAGAA GAAGAAGTGCTTACGGGCAACCTTCAAACATTGAAAATAAGAGTACATGAGGGGTATGAAGAGTTCACAA TGGTTGGGAGAAGAGCAACAGCTATTCTAAGAAAGGCAACCAGGAGATTGATCCAGCTAATAGTAAGTGG GAGAGACGAGCAGTCAATTGCTGAAGCAATAATTGTGGCCATGGTATTTTCACAAGAGGATTGCATGATC AAGGCAGTTCGGGGTGATCTGAACTTTGTCAATAGGGCAAATCAGCGACTGAACCCCATGCATCAACTCT TGAGACACTTCCAAAAGGATGCAAAAGTGCTTTTCCAAAACTGGGGAATTGAACCCATTGACAATGTGAT GGGAATGATCGGAATATTGCCCGACATGACCCCAAGTACTGAGATGTCGCTGAGGGGGATAAGAGTCAGC AAAATGGGAGTAGATGAATACTCCAGCACAGAAAGGGTGGTGGTGAGCATTGACCGATTTTTAAGGGTTC GGGATCAACGGGGAAACGTACTATTGTCACCCGAAGAAGTTAGCGAGACACAAGGAACGGAGAAACTGAC AATAACTTATTCGTCATCAATGATGTGGGAGATCAATGGTCCTGAGTCGGTGTTGGTCAATACTTATCAA TGGATCATCAGGAACTGGGAGACTGTGAAAATTCAATGGTCACAGGATCCCACAATGTTATATAATAAGA TGGAATTCGAGCCATTTCAGTCTCTGGTCCCTAAGGCAGCCAGAGGTCAATACAGCGGATTCGTGAGGAC ACTGTTCCAGCAGATGCGGGATGTGCTTGGAACATTTGACACTGTTCAGATAATAAAACTTCTTCCCTTT GCTGCTGCTCCACCAGAACAGAGTAGGATGCAGTTCTCCTCCCTGACTGTGAATGTGAGAGGATCAGGAA TGAGGATACTGGTAAGAGGCAATTCTCCAGTGTTCAATTACAACAAGGCCACCAAGAGGCTTACAGTCCT TGGAAAAGATGCAGGTGCATTGACCGAAGATCCAGATGAAGGCACAGCTGGAGTGGAGTCTGCTGTTCTA AGAGGATTCCTCATTTTGGGCAAAGAAGACAAGAGATATGGCCCAGCATTAAGCATCAATGAGCTGAGCA ATCTTGCAAAAGGAGAGAAGGCTAATGTGCTAATTGGGCAAGGAGACGTGGTGTTGGTAATGAAACGGAA ACGGGACTCTAGCATACTTACTGACAGCCAGACAGCGACCAAAAGAATTCGGATGGCCATCAATTAGTGT CGAATTGTTTAAAAACGACCTTGTTTCTACT
>CY003886.1 Influenza A virus (A/mallard duck/ALB/376/1985(H2N3)) segment 1, complete sequence
AGCGAAAGCAGGTCAAATATATTCAATATGGAGAGAATAAAAGAACTAAGAGATCTAATGTCACAGTCCC
GCACTCGCGAGATACTCACCAAAACCACTGTGGACCATATGGCCATAATCAAAAAATACACATCAGGAAG
GCAAGAGAAGAATCCCGCACTCAGGATGAAATGGATGATGGCAATGAAATATCCAATTACAGCGGATAAG
AGGATAATGGAGATGATTCCCGAGAGGAATGAACAAGGGCAAACCCTCTGGAGCAAAACAAATGATGCCG
GCTCAGACCGAGTGATGGTATCACCTCTGGCTGTGACATGGTGGAATAGGAATGGACCAACAACAAGTAC
AATTCACTACCCAAAGGTATATAAAACCTATTTCGAAAAGGTCGAAAGGTTAAAACATGGGACCTTTGGC
CCCGTTCACTTCAGGAATCAAGTTAAGATAAGACGGAGAGTTGACATAAACCCTGGACATGCAGACCTCA
GTGCCAAAGAGGCACAGGATGTAATCATGGAAGTTGTTTTCCCAAATGAAGTGGGGGCCAGGATATTAAC
ATCGGAGTCACAGCTGACAATAACAAAAGAGAAAAAGGAAGAACTCCAAGATTGTAAAATTGCCCCCTTG
ATGGTAGCATACATGCTAGAAAGAGAGTTAGTCCGCAAAACGAGGTTCCTCCCAGTGGCTGGTGGAACAA
GCAGTGTTTATATTGAGGTGTTGCATTTGACCCAGGGAACATGCTGGGAACAAATGTACACTCCAGGAGG
GGAAGTGAGAAATGATGATGTTGACCAAAGCTTAATTATCGCTGCCAGGAATATAGTAAGAAGAGCAACG
GTATCAGCAGACCCACTAGCGTCTCTATTGGAGATGTGCCACAGCACACAGATTGGTGGAATAAGGATGG
TAGACATCCTTAGGCAGAATCCAACAGAGGAACAAGCCGTGGATATATGCAAGGCGGCAATGGGCTTGAG
GATTAGCTCATCTTTCAGCTTCGGTGGATTCACTTTTAAAAGAACAAGTGGGTCGTCAGTCAAAAGAGAA
GAAGAAGTGCTTACGGGCAACCTTCAAACACTGAAAATAAGAGTGCATGAGGGGTATGAAGAATTCACAA
TGGTTGGGAGAAGAGCAACAGCTATTCTCAGGAAGGCAACCAGGAGATTGATTCAGCTAATAGTCAGTGG
GAGAGATGAACAGTCAATTGCTGAAGCAATAATTGTAGCTATGGTATTTTCACAAGAGGATTGCATGATC
AAGGCAGTTCGGGGTGATCTGAACTTTGTCAATAGAGCAAACCAGCGACTGAACCCCATGCATCAACTCT
TGAGACATTTCCAAAAGGATGCAAAAGTGCTTTTCCAAAATTGGGGAATTGAACCCATTGACAATGTGAT
GGGAATGATCGGAATACTACCCGACATGACCCCAAGTACTGAGACGTCATTGAGAGGGATAAGAGTCAGC
AAAATGGGAGTGGATGAATACTCCAGCACAGAGAGAGTGGTGGTGAGCATTGACCGTTTTTTAAGGGTTC
GGGATCAACGGGGAAACGTACTATTGTCACCTGAAGAAGTCAGCGAGACGCAAGGGACGGAAAAGTTGAC
AATAACTTACTCATCATCAATGATGTGGGAGATCAATGGTCCTGAATCAGTGTTGGTCAATACTTACCAG
TGGATCATCAGAAACTGGGAGACTGTGAAAATTCAATGGTCACAGGATCCCACAATGTTGTACAATAAGA
TGGAATTCGAGCCATTTCAGTCTCTGGTCCCTAAGGCAGCTAGAGGTCAATACAGCGGATTCGTGAGGAC
GCTGTTCCAACAAATGCGGGATGTGCTTGGAACATTTGACACTGTTCAGATAATAAAACTTCTCCCCTTT
GCTGCTGCCCCACCAGAACAGAGTAGGATGCAGTTCTCCTCCTTGACTGTGAATGTAAGAGGATCAGGAA
TGAGGATACTGGTAAGAGGCAACTCTCCAGTGTTCAATTACAACAAGGCCACCAAGAGGCTTACAGTCCT
CGGGAAGGATGCAGGTGCATTAACTGAAGACCCAGATGAAGGCACAGCTGGAGTGGAATCTGCTGTTCTG
AGAGGATTCCTCATTTTGGGCAAAGAAGACAAGAGATATGGCCCAGCATTGAGCATCAATGAGCTGAGCA
ATCTTGCAAAAGGAGAGAAGGCTAATGTGCTAATTGGGCAAGGAGACGTGGTGTTGGTAATGAAACGGAA
ACGGGACTCTAGCATACTTACTGACAGCCAGACAGCGACCAAAAGGATTCGGATGGCCATCAATTAGTGT
CGAATTGTTTAAAAACGACCTTGTTTCTACT
See also
References
- ^ Gibbs, Adrian J.; McIntyre, George A. (1970). "The Diagram, a Method for Comparing Sequences. Its Use with Amino Acid and Nucleotide Sequences". Eur. J. Biochem. 16: 1–11. doi:10.1111/j.1432-1033.1970.tb01046.x.
Software to create plots
- SynMap - An easy to use, web-based tool to generate dotplots for many species with access to an extensive genome database. Offered by the comparative genomics platform CoGe.
- Genomdiff – An open source Java dot plot program for viruses.
- Gepard[1] - Dot plot tool suitable for even genome scale.
- ANACON – Contact analysis of dot plots.
- General introduction to dot plots with example algorithms and a software tool to create small and medium size dot plots.
- Dotlet – Provides a program allowing you to construct a dot plot with your own sequences.
- UGENE Dot Plot viewer – Opensource dot plot visualizer.
- seqinr - R package to generate dot plots.
- dotplot - R package to rapidly generate dot plots as either traditional or ggplot graphics.
- dotmatcher - Web tool to generate dot plots.
- Dotter[2] - Stand alone program to generate dot plots.
- JDotter[3] - Java version of Dotter.
- Dotplot, easy (educational) HTML5 tool to generate dot plots from RNA sequences.
- lastz and laj, programs to prepare and visualize genomic alignments.
- Flexidot, customizable and ambiguity-aware dotplot suite for visual sequence analyses implemented in Python.
- ^ Krumsiek, J.; Arnold, R.; Rattei, T. (2007-04-15). "Gepard: a rapid and sensitive tool for creating dotplots on genome scale". Bioinformatics. 23 (8): 1026–1028. doi:10.1093/bioinformatics/btm039. ISSN 1367-4803.
- ^ Sonnhammer, E. L.; Durbin, R. (1995-12-29). "A dot-matrix program with dynamic threshold control suited for genomic DNA and protein sequence analysis". Gene. 167 (1–2): GC1–10. ISSN 0378-1119. PMID 8566757.
- ^ Brodie, R.; Roper, R. L.; Upton, C. (2004-01-22). "JDotter: a Java interface to multiple dotplots generated by dotter". Bioinformatics. 20 (2): 279–281. doi:10.1093/bioinformatics/btg406. ISSN 1367-4803.