List of sequence alignment software
Appearance
This list of sequence alignment software is a compilation of software tools and web portals used in pairwise sequence alignment and multiple sequence alignment. See structural alignment software for structural alignment of proteins.
Database search only
Name | Description | Sequence Type* | Link | Author | Year |
---|---|---|---|---|---|
BLAST | k-tuple local search (Basic Local Alignment Search Tool) | Both | NCBI EBI DDBJ DDBJ (psi-blast) GenomeNet PIR (protein only) | Myers E, Altschul SF, Gish W, Miller EW, Lipman DJ, NCBI | 1990 |
Combinatorial Extension | Structural alignment search | Protein | server | Lipman DJ, Pearson WR | 1985 |
FASTA | k-tuple local search | Both | EBI DDBJ GenomeNet PIR (protein only) | ||
GGSEARCH / GLSEARCH | Global:Global (GG), Global:Local (GL) alignment with statistics | Protein | FASTA server | ||
HMMER | Hidden Markov profile search | Protein/DNA | download DDBJ (HMMPFAM) | Eddy SR, Krogh A, Mitchison G | 1998 |
IDF | Inverse Document Frequency | Both | download | ||
Infernal | profile SCFG search | RNA | download | Eddy S | |
SAM | Hidden Markov profile search | Protein/DNA | SAM | Karplus K, Krogh A | |
SSEARCH | Smith-Waterman search (more sensitive than FASTA) | Both | EBI DDBJ server | ||
*Sequence Type: Protein or nucleotide |
Pairwise alignment
Name | Description | Sequence Type* | Alignment Type** | Link | Author | Year |
---|---|---|---|---|---|---|
Bioconductor Biostrings::pairwiseAlignment | Dynamic programming | Both | Both + Ends-free | site | P. Aboyoun | 2008 |
BioPerl dpAlign | Dynamic programming | Both | Both + Ends-free | site | Y. M. Chan | 2003 |
BLASTZ | Seeded pattern-matching | Nucleotide | Local | download | Schwartz et al. | 2003 |
DNADot | Web-based dot-plot tool | Nucleotide | Global | server | R. Bowen | 1998 |
DOTLET | Java-based dot-plot tool | Both | Global | applet | M. Pagni and T. Junier | 1998 |
GGSEARCH, GLSEARCH | Global:Global (GG), Global:Local (GL) alignment with statistics | Protein | Global in query | FASTA server | W. Pearson | 2007 |
JAligner | Open source Java implementation of Smith-Waterman | Both | Local | JWS | A. Moustafa | 2005 |
LALIGN | Multiple, non-overlapping, local similarity (same algorithm as SIM) | Both | Local non-overlapping | server FASTA server | W. Pearson | 1991 (algorithm) |
matcher | Memory-optimized needleman but slow dynamic programming (based on LALIGN) | Both | Local | server | I. Longden (modified from W. Pearson) | 1999 |
MCALIGN2 | explicit models of indel evolution | DNA | Global | server | J. Wang et al. | 2006 |
MUMmer | Suffix-Tree based | Nucleotide | Global | download | S. Kurtz et al. | 2004 |
needle | Needleman-Wunsch dynamic programming | Both | Global | EBIserver | A. Bleasby | 1999 |
Ngila | logarithmic and affine gap costs and explicit models of indel evolution | Both | Global | download | R. Cartwright | 2007 |
PatternHunter | Seeded pattern-matching | Nucleotide | Local | download | B. Ma et al. | 2002-2004 |
ProbA (also propA) | Stochastic partition function sampling via dynamic programming | Both | Global | download | U. Mückstein | 2002 |
PyMOL | "align" command aligns sequence & applies it to structure | Protein | Global (by selection) | site | W. L. DeLano | 2007 |
REPuter | Suffix-Tree based | Nucleotide | Local | download | S. Kurtz et al. | 2001 |
SEQALN | Various dynamic programming | Both | Local or Global | server | M.S. Waterman and P. Hardy | 1996 |
SIM, GAP, NAP, LAP | Local similarity with varying gap treatments | Both | Local or global | server | X. Huang and W. Miller | 1990-6 |
SIM | Local similarity | Both | Local | servers | X. Huang and W. Miller | 1991 |
SLIM Search | Ultra-fast blocked alignment | Both | Both | site | L. Bloksberg | 2004 |
SSEARCH | Local (Smith-Waterman) alignment with statistics | Protein | Local | EBI FASTA server | W. Pearson | 1981 (Algorithm) |
SWIFT suit | Fast Local Alignment Searching | DNA | Local | site | K. Rasmussen, W. Gerlach | 2005,2008 |
stretcher | Memory-optimized but slow dynamic programming | Both | Global | server | I. Longden (modified from G. Myers and W. Miller) | 1999 |
tranalign | Aligns nucleic acid sequences given a protein alignment | Nucleotide | NA | server | G. Williams (modified from B. Pearson) | 2002 |
water | Smith-Waterman dynamic programming | Both | Local | EBIPasteur server | A. Bleasby | 1999 |
wordmatch | k-tuple pairwise match | Both | NA | server | I. Longden | 1998 |
YASS | Seeded pattern-matching | Nucleotide | Local | server download | L. Noe and G. Kucherov | 2003-2007 |
*Sequence Type: Protein or nucleotide. **Alignment Type: Local or global |
Multiple sequence alignment
Name | Description | Sequence Type* | Alignment Type** | Link | Author | Year |
---|---|---|---|---|---|---|
ABA | A-Bruijn alignment | Protein | Global | download | B.Raphaelet al. | 2004 |
ALE | manual alignment ; some software assistance | Nucleotides | Local | download | J. Blandy and K. Fogel | 1994 (latest version 2007) |
AMAP | Sequence annealing | Both | Global | server | A. Schwartz and L. Pachter | 2006 |
BAli-Phy | Tree+Multi alignment ; Probabilistic/Bayesian ; Joint Estimation | Both | Global | WWW+download | BD Redelings and MA Suchard | 2005 (latest version 2007) |
CHAOS/DIALIGN | Iterative alignment | Both | Local (preferred) | server | M. Brudno and B. Morgenstern | 2003 |
ClustalW | Progressive alignment | Both | Local or Global | download EBI DDBJ PBIL EMBNet GenomeNet | Thompson et al. | 1994 |
CodonCode Aligner | Multi alignment; ClustalW & Phrap support | Nucleotides | Local or Global | download | P. Richterich et al. | 2003 (latest version 2008) |
DIALIGN-TX and DIALIGN-T | Segment-based method | Both | Local (preferred) or Global | download and server | A.R.Subramanian | 2005 (latest version 2008) |
DNA Alignment | Segment-based method for intraspecific alignments | Both | Local (preferred) or Global | server | A.Roehl | 2005 (latest version 2008) |
Ed'Nimbus | Seeded filtration | Nucleotides | Local | server | P. Peterlongo et al. | 2006 |
FSA | Sequence annealing | Both | Global | download and server | R. K. Bradley et al. | 2008 |
Geneious | Progressive/Iterative alignment; ClustalW plugin | Both | Local or Global | download | A.J. Drummond et al. | 2005 (latest version 2008) |
Kalign | Progressive alignment | Both | Global | serverEBI MPItoolkit | T. Lassmann | 2005 |
MSA | Dynamic programming | Both | Local or Global | download | D.J. Lipman et al. | 1989 (modified 1995) |
PRRN/PRRP | Iterative alignment (especially refinement) | Protein | Local or Global | PRRP PRRN | Y. Totoki (based on O. Gotoh) | 1991 and later |
POA | Partial order/hidden Markov model | Protein | Local or Global | download | C. Lee | 2002 |
SAM | Hidden Markov model | Protein | Local or Global | server | A. Krogh et al. | 1994 (most recent version 2002) |
MAFFT | Progressive/iterative alignment | Both | Local or Global | GenomeNet MAFFT | K. Katoh et al. | 2005 |
MAVID | Progressive alignment | Both | Global | server | N. Bray and L. Pachter | 2004 |
MULTALIN | Dynamic programming/clustering | Both | Local or Global | server | F. Corpet | 1988 |
Multi-LAGAN | Progressive dynamic programming alignment | Both | Global | server | M. Brudno et al. | 2003 |
MUSCLE | Progressive/iterative alignment | Both | Local or Global | server | R. Edgar | 2004 |
Pecan | Probabilistic/consistency | DNA | Global | download | B. Paten et al. | 2008 |
ProbCons | Probabilistic/consistency | Protein | Local or Global | server | C. Do et al. | 2005 |
PSAlign | Alignment preserving non-heuristic | Both | Local or Global | download | S.H. Sze, Y. Lu, Q. Yang. | 2006 |
SAGA | Sequence alignment by genetic algorithm | Protein | Local or Global | download | C. Notredame et al. | 1996 (new version 1998) |
T-Coffee | More sensitive progressive alignment | Both | Local or Global | server download | C. Notredame et al. | 2000 (newest version 2008) |
RevTrans | Combines DNA and Protein alignment, by back translating the protein alignment to DNA. | DNA/Protein (special) | Local or Global | server | Wernersson and Pedersen | 2003 (newest version 2005) |
*Sequence Type: Protein or nucleotide. **Alignment Type: Local or global |
Genomics analysis
Name | Description | Sequence Type* | Link |
---|---|---|---|
SLAM | Gene finding, alignment, annotation (human-mouse homology identification) | Nucleotide | server |
Mauve | Multiple alignment of rearranged genomes | Nucleotide | download |
MGA | Multiple Genome Aligner | Nucleotide | download |
Mulan | Local multiple alignments of genome-length sequences | Nucleotide | server |
Sequerome | Profiling sequence alignment data with major servers/services | Nucleotide/peptide | server |
AVID | Pairwise global alignment with whole genomes | Nucleotide | server |
SIBsim4 / Sim4 | A program designed to align an expressed DNA sequence with a genomic sequence, allowing for introns | Nucleotide | download |
Shuffle-LAGAN | Pairwise glocal alignment of completed genome regions | Nucleotide | server |
ACT (Artemis Comparison Tool) | Synteny and comparative genomics | Nucleotide | server |
*Sequence Type: Protein or nucleotide |
Motif finding
Name | Description | Sequence Type* | Link |
---|---|---|---|
MEME/MAST | Motif discovery and search | Both | server |
BLOCKS | Ungapped motif identification from BLOCKS database | Both | server |
eMOTIF | Extraction and identification of shorter motifs | Both | servers |
Gibbs motif sampler | Stochastic motif extraction by statistical likelihood | Both | server (one of many implementations) |
TEIRESIAS | Motif extraction and database search | Both | server |
PRATT | Pattern generation for use with ScanProsite | Protein | server |
ScanProsite | Motif database search tool | Protein | server |
PHI-Blast | Motif search and alignment tool | Both | server |
I-sites | Local structure motif library | Protein | server |
*Sequence Type: Protein or nucleotide |
Benchmarking
Name | Link | Authors |
---|---|---|
BAliBASE | download | Thompson, Plewniak, Poch |
HOMSTRAD | download | Stebbings, Mizuguchi |
Oxbench | download | Raghava, Searle, Audley, Barber, Barton |
PFAM | download | |
PREFAB | download | Edgar |
SABmark | download | Van Walle, Lasters, Wyns |
SMART | download | Letunic, Copley, Schmidt, Ciccarelli, Doerks, Schultz, Ponting, Bork |
Alignment Viewers/Editors
Name | Integrated with Struct. Prediction Tools | Can Align Sequences | Can Calculate Phylogenetic Trees | Other Features | Formats Supported | License | Link | |
---|---|---|---|---|---|---|---|---|
DnaSP | can compute several population genetics statistics, reconstruct haplotypes with PHASE | FASTA, NEXUS, Mega, PHYLIP | Freeware | link | ||||
Strap | Jnet, NNPREDICT, Coiled coil, 16 different TM-helix | Fifteen different Methods | Neighbor-joining | Dot-plot, Structure-neighbors, 3D-superposition, Blast-search, Mutation/SNP analysis, Sequence features, Biojava-interface | MSF, Stockholm, Clustalw, Nexus, FASTA, PDB, Embl, GenBank, hssp, Pfam | GPL | link | |
Seaview | No | local Muscle/Clustalw | No | Dot-plot, vim-like editing keys | NEXUS, MSF, Clustal, FASTA, PHYLIP, MASE | link | ||
Jalview 2 | Secundary Struct. Prediction via JNET | Clustal, Muscle via web services | UPGMA, NJ | features from arbitrary DAS servers | FASTA, PFAM, MSF, Clustal, BLC, PIR | GPL | link | |
CLC viewer (Free version) | only in commercial version | Clustal, Muscle, T-Coffee, MAFFT, kalign, various | UPGMA, NJ | workflows, blast/genbank search | many | Freeware. More options available in commercial versions. | link, table of features | |
UGENE | No | MUSCLE | No | many | FASTA, GenBank, EMBL, ABIF, SCF, CLUSTALW, Stockholm, Newick, PDB | GPL | link | |
Mega | No | Native ClustalW | UPGMA, NJ, ME, MP, with bootstrap and confidence test | extended support to phylogenetics analysis | FASTA, Clustal, Nexus, Mega, etc.. | Freeware, registration requested | link, table of features | |
Genedoc | No, but can read/show annotations | Pairwise | No, but can read/show annotations | gel simulation, stats, multiple views, simple | many | Free | link table of features | |
SeqPop | No | Free | link | |||||
BioEdit | No | ClustalW | rudimentary, can read phylip | plasmid drawing, ABI chromatograms, | Genbank, Fasta, Phylip 3.2, Phylip 4, NBRF/PIR | Free | link | |
Ale (emacs plugin) | No | Yes | No | No | GenBank, EMBL, Fast-A, and Phylip | GPL | link | |
Ralee (emacs plugin for RNA al. editing) | RNA structure | Stockholm | GPL | link | ||||
emacs - biomode | link |
Some useful discussions on sequence alignment editors/viewers can be found here:
Short-Read Sequence Alignment
Name | Description | Multi-threaded | License | Link | |
---|---|---|---|---|---|
BFAST | Explicit time and accuracy tradeoff with a prior accuracy estimation, supported by indexing the reference sequences. Optimally compresses indexes. Can handle billions of short reads. Can handle insertions, deletions, SNPs, and color errors (can map ABI SOLiD color space reads). Performs a full Smith Waterman alignment. | Yes (POSIX) | Free for academic and non-commercial use. | link | |
MAQ | Ungapped alignment that takes into account quality scores for each base. | GPL | link | ||
Novoalign | Gapped alignment of single end and paired end Illumina GA I & II reads. Uses base qualities. Reports can be converted to MAQ map files for analysis with MAQ utilities. Performance comparable to MAQ but with gapped alignment. | Multi-threading available with paid license. | Single threaded version free for academic and non-commercial use. | link | |
ELAND | Implemented by Illumina. Includes ungapped alignment with a finite read length. | ||||
SSAHA and SSAHA2 | Fast for a small number of variants. | Free for academic and non-commercial use. | link | ||
SOAP | Robust with a small (1-3) number of gaps and mismatches. Speed improvement over BLAT, uses a 12 letter hash table. | GPL | link | ||
BLAT | Made by Jim Kent. De facto standard for nucleotide sequence alignment, ungapped alignment, and can handle one mismatch in initial alignment step. | Yes (client/server). | Free for academic and non-commercial use. | link | |
BLASTN | BLAST's nucleotide alignment program, slow and not accurate for short reads, and uses a sequence database (EST, sanger sequence) rather than a reference genome. | link | |||
ZOOM | Reads can vary between 25bp and 64bp in length. Fast but not accurate for >2 mismatches. Ignores insertions and deletions. | Commercial | link | ||
RMAP | Read lengths can range from 20bp to at most 64bp. Uses the "exclusion principle" to allow for mismatches and look-up reads in an index. | link | |||
SHRiMP | Indexes the reads instead of the reference genome. Uses masks to generate possible keys. Can map ABI SOLiD color space reads. | BSD derivative | link | ||
Bowtie | Uses a Burrows-Wheeler to create a permanent, reusable index of the genome; 1.3 GB memory footprint for human genome. Aligns more than 25 million Illumina reads in 1 CPU hour. Supports Maq-like and SOAP-like alignment policies. | Yes (POSIX) | Artistic License | link | |
QPalma | Uses known alignments to align targeted spliced reads. Useful for transcriptome resequencing and gene exploration. | link | |||
SOCS | For ABI SOLiD technologies. Significant increase in time to map reads with mismatches (or color errors). Uses an iterative version of the Rabin-Karp string search algorithm. | Yes | link | ||
MOSAIK | Fast and incorporates assembly. Aligns reads using a hashing scheme. Must split the reads many times to be robust against increasing number of mismatches. | Yes | link | ||
SLIDER | Slider is an application for the Illumina Sequence Analyzer output that uses the "probability" files instead of the sequence files as an input for alignment to a reference sequence or a set of reference sequences. | link |
External links
- Pollard et al. (2004) (PubMed Central free fulltext): The authors discuss LAGAN, CHAOS, and Dialign as the most effective tools tested for certain uses.