Sequence graph
In comparative genomics, a sequence graph, also called an alignment graph, breakpoint graph, or adjacency graph, is a bidirected graph in which the vertices represent segments of DNA and the edges represent adjacency between segments in a genome. [1] The segments are labeled by the DNA string they represent, and each edge connects the tail end of one segment with the head end of another segment. Each adjacency edge is labelled by a (possibly empty) string of DNA. Traversing a connected component of segments and adjacency edges (called a thread) yields a sequence, which typically represents a genome or a section of a genome. The segments can be thought of as synteny blocks, with the edges dictating how to arrange these blocks in a particular genome, and the labelling of the adjacency edges representing bases that are not contained in synteny blocks.
Applications
Multiple sequence alignment
Sequence graphs can be used to represent multiple sequence alignments with the addition of a new kind of edge representing homology between segments. [2] For a set of genomes, one can create an acyclic breakpoint graph with a thread for each genome. For two segments and , where ,,, and represent the endpoints of the two segments, homology edges can be created from to and to or from to and to - representing the two possible orientations of the homology. The advantage of representing a multiple sequence alignment this way is that it is possible to include inversions and other structural rearrangements that wouldn't be allowable in a matrix representation.
References
- ^ Alekseyev, Max A., and Pavel A. Pevzner. “Breakpoint Graphs and Ancestral Genome Reconstructions.” Genome Research 19.5 (2009): 943–957. PMC. Web. 20 Mar. 2016.
- ^ Paten B, Zerbino DR, Hickey G, Haussler D. A unifying model of genome evolution under parsimony. BMC Bioinformatics. 2014;15:206. doi:10.1186/1471-2105-15-206.