List of sequence alignment software

This list of sequence alignment software is a compilation of software tools and web portals used in pairwise sequence alignment and multiple sequence alignment. See structural alignment software for structural alignment of proteins.

Database search only

Name	Description	Sequence Type*	Link	Author	Year
BLAST	k-tuple local search (Basic Local Alignment Search Tool)	Both	NCBI EBI DDBJ DDBJ (psi-blast) GenomeNet PIR (protein only)	Myers E, Altschul SF, Gish W, Miller EW, Lipman DJ, NCBI	1990
Combinatorial Extension	Structural alignment search	Protein	server	Lipman DJ, Pearson WR	1985
FASTA	k-tuple local search	Both	EBI DDBJ GenomeNet PIR (protein only)
GGSEARCH / GLSEARCH	Global:Global (GG), Global:Local (GL) alignment with statistics	Protein	FASTA server
HMMER	Hidden Markov profile search	Protein/DNA	download DDBJ (HMMPFAM)	Eddy SR, Krogh A, Mitchison G	1998
IDF	Inverse Document Frequency	Both	download
Infernal	profile SCFG search	RNA	download	Eddy S
SAM	Hidden Markov profile search	Protein/DNA	SAM	Karplus K, Krogh A
SSEARCH	Smith-Waterman search (more sensitive than FASTA)	Both	EBI DDBJ server
*Sequence Type: Protein or nucleotide

Pairwise alignment

Name	Description	Sequence Type*	Alignment Type**	Link	Author	Year
Bioconductor Biostrings::pairwiseAlignment	Dynamic programming	Both	Both + Ends-free	site	P. Aboyoun	2008
BioPerl dpAlign	Dynamic programming	Both	Both + Ends-free	site	Y. M. Chan	2003
BLASTZ	Seeded pattern-matching	Nucleotide	Local	download	Schwartz et al.	2003
DNADot	Web-based dot-plot tool	Nucleotide	Global	server	R. Bowen	1998
DOTLET	Java-based dot-plot tool	Both	Global	applet	M. Pagni and T. Junier	1998
GGSEARCH, GLSEARCH	Global:Global (GG), Global:Local (GL) alignment with statistics	Protein	Global in query	FASTA server	W. Pearson	2007
JAligner	Open source Java implementation of Smith-Waterman	Both	Local	JWS	A. Moustafa	2005
LALIGN	Multiple, non-overlapping, local similarity (same algorithm as SIM)	Both	Local non-overlapping	server FASTA server	W. Pearson	1991 (algorithm)
matcher	Memory-optimized needleman but slow dynamic programming (based on LALIGN)	Both	Local	server	I. Longden (modified from W. Pearson)	1999
MCALIGN2	explicit models of indel evolution	DNA	Global	server	J. Wang et al.	2006
MUMmer	Suffix-Tree based	Nucleotide	Global	download	S. Kurtz et al.	2004
needle	Needleman-Wunsch dynamic programming	Both	Global	EBI server	A. Bleasby	1999
Ngila	logarithmic and affine gap costs and explicit models of indel evolution	Both	Global	download	R. Cartwright	2007
PatternHunter	Seeded pattern-matching	Nucleotide	Local	download	B. Ma et al.	2002-2004
ProbA (also propA)	Stochastic partition function sampling via dynamic programming	Both	Global	download	U. Mückstein	2002
PyMOL	"align" command aligns sequence & applies it to structure	Protein	Global (by selection)	site	W. L. DeLano	2007
REPuter	Suffix-Tree based	Nucleotide	Local	download	S. Kurtz et al.	2001
SEQALN	Various dynamic programming	Both	Local or Global	server	M.S. Waterman and P. Hardy	1996
SIM, GAP, NAP, LAP	Local similarity with varying gap treatments	Both	Local or global	server	X. Huang and W. Miller	1990-6
SIM	Local similarity	Both	Local	servers	X. Huang and W. Miller	1991
SLIM Search	Ultra-fast blocked alignment	Both	Both	site	L. Bloksberg	2004
SSEARCH	Local (Smith-Waterman) alignment with statistics	Protein	Local	EBI FASTA server	W. Pearson	1981 (Algorithm)
SWIFT suit	Fast Local Alignment Searching	DNA	Local	site	K. Rasmussen, W. Gerlach	2005,2008
stretcher	Memory-optimized but slow dynamic programming	Both	Global	server	I. Longden (modified from G. Myers and W. Miller)	1999
tranalign	Aligns nucleic acid sequences given a protein alignment	Nucleotide	NA	server	G. Williams (modified from B. Pearson)	2002
water	Smith-Waterman dynamic programming	Both	Local	EBI Pasteur server	A. Bleasby	1999
wordmatch	k-tuple pairwise match	Both	NA	server	I. Longden	1998
YASS	Seeded pattern-matching	Nucleotide	Local	server download	L. Noe and G. Kucherov	2003-2007
*Sequence Type: Protein or nucleotide. **Alignment Type: Local or global

Multiple sequence alignment

Name	Description	Sequence Type*	Alignment Type**	Link	Author	Year
ABA	A-Bruijn alignment	Protein	Global	download	B.Raphaelet al.	2004
ALE	manual alignment ; some software assistance	Nucleotides	Local	download	J. Blandy and K. Fogel	1994 (latest version 2007)
AMAP	Sequence annealing	Both	Global	server	A. Schwartz and L. Pachter	2006
BAli-Phy	Tree+Multi alignment ; Probabilistic/Bayesian ; Joint Estimation	Both	Global	WWW+download	BD Redelings and MA Suchard	2005 (latest version 2007)
CHAOS/DIALIGN	Iterative alignment	Both	Local (preferred)	server	M. Brudno and B. Morgenstern	2003
ClustalW	Progressive alignment	Both	Local or Global	download EBI DDBJ PBIL EMBNet GenomeNet	Thompson et al.	1994
CodonCode Aligner	Multi alignment; ClustalW & Phrap support	Nucleotides	Local or Global	download	P. Richterich et al.	2003 (latest version 2008)
DIALIGN-TX and DIALIGN-T	Segment-based method	Both	Local (preferred) or Global	download and server	A.R.Subramanian	2005 (latest version 2008)
DNA Alignment	Segment-based method for intraspecific alignments	Both	Local (preferred) or Global	server	A.Roehl	2005 (latest version 2008)
Ed'Nimbus	Seeded filtration	Nucleotides	Local	server	P. Peterlongo et al.	2006
FSA	Sequence annealing	Both	Global	download and server	R. K. Bradley et al.	2008
Geneious	Progressive/Iterative alignment; ClustalW plugin	Both	Local or Global	download	A.J. Drummond et al.	2005 (latest version 2008)
Kalign	Progressive alignment	Both	Global	server EBI MPItoolkit	T. Lassmann	2005
MSA	Dynamic programming	Both	Local or Global	download	D.J. Lipman et al.	1989 (modified 1995)
PRRN/PRRP	Iterative alignment (especially refinement)	Protein	Local or Global	PRRP PRRN	Y. Totoki (based on O. Gotoh)	1991 and later
POA	Partial order/hidden Markov model	Protein	Local or Global	download	C. Lee	2002
SAM	Hidden Markov model	Protein	Local or Global	server	A. Krogh et al.	1994 (most recent version 2002)
MAFFT	Progressive/iterative alignment	Both	Local or Global	GenomeNet MAFFT	K. Katoh et al.	2005
MAVID	Progressive alignment	Both	Global	server	N. Bray and L. Pachter	2004
MULTALIN	Dynamic programming/clustering	Both	Local or Global	server	F. Corpet	1988
Multi-LAGAN	Progressive dynamic programming alignment	Both	Global	server	M. Brudno et al.	2003
MUSCLE	Progressive/iterative alignment	Both	Local or Global	server	R. Edgar	2004
Pecan	Probabilistic/consistency	DNA	Global	download	B. Paten et al.	2008
ProbCons	Probabilistic/consistency	Protein	Local or Global	server	C. Do et al.	2005
PSAlign	Alignment preserving non-heuristic	Both	Local or Global	download	S.H. Sze, Y. Lu, Q. Yang.	2006
SAGA	Sequence alignment by genetic algorithm	Protein	Local or Global	download	C. Notredame et al.	1996 (new version 1998)
T-Coffee	More sensitive progressive alignment	Both	Local or Global	server download	C. Notredame et al.	2000 (newest version 2008)
RevTrans	Combines DNA and Protein alignment, by back translating the protein alignment to DNA.	DNA/Protein (special)	Local or Global	server	Wernersson and Pedersen	2003 (newest version 2005)
*Sequence Type: Protein or nucleotide. **Alignment Type: Local or global

Genomics analysis

Name	Description	Sequence Type*	Link
SLAM	Gene finding, alignment, annotation (human-mouse homology identification)	Nucleotide	server
Mauve	Multiple alignment of rearranged genomes	Nucleotide	download
MGA	Multiple Genome Aligner	Nucleotide	download
Mulan	Local multiple alignments of genome-length sequences	Nucleotide	server
Sequerome	Profiling sequence alignment data with major servers/services	Nucleotide/peptide	server
AVID	Pairwise global alignment with whole genomes	Nucleotide	server
SIBsim4 / Sim4	A program designed to align an expressed DNA sequence with a genomic sequence, allowing for introns	Nucleotide	download
Shuffle-LAGAN	Pairwise glocal alignment of completed genome regions	Nucleotide	server
ACT (Artemis Comparison Tool)	Synteny and comparative genomics	Nucleotide	server
*Sequence Type: Protein or nucleotide

Motif finding

Name	Description	Sequence Type*	Link
MEME/MAST	Motif discovery and search	Both	server
BLOCKS	Ungapped motif identification from BLOCKS database	Both	server
eMOTIF	Extraction and identification of shorter motifs	Both	servers
Gibbs motif sampler	Stochastic motif extraction by statistical likelihood	Both	server (one of many implementations)
TEIRESIAS	Motif extraction and database search	Both	server
PRATT	Pattern generation for use with ScanProsite	Protein	server
ScanProsite	Motif database search tool	Protein	server
PHI-Blast	Motif search and alignment tool	Both	server
I-sites	Local structure motif library	Protein	server
*Sequence Type: Protein or nucleotide

Benchmarking

Name	Link	Authors
BAliBASE	download	Thompson, Plewniak, Poch
HOMSTRAD	download	Stebbings, Mizuguchi
Oxbench	download	Raghava, Searle, Audley, Barber, Barton
PFAM	download
PREFAB	download	Edgar
SABmark	download	Van Walle, Lasters, Wyns
SMART	download	Letunic, Copley, Schmidt, Ciccarelli, Doerks, Schultz, Ponting, Bork

Alignment Viewers/Editors

Name	Integrated with Struct. Prediction Tools	Can Align Sequences	Can Calculate Phylogenetic Trees	Other Features	Formats Supported	License	Link
DnaSP				can compute several population genetics statistics, reconstruct haplotypes with PHASE	FASTA, NEXUS, Mega, PHYLIP	Freeware	link
Strap	Jnet, NNPREDICT, Coiled coil, 16 different TM-helix	Fifteen different Methods	Neighbor-joining	Dot-plot, Structure-neighbors, 3D-superposition, Blast-search, Mutation/SNP analysis, Sequence features, Biojava-interface	MSF, Stockholm, Clustalw, Nexus, FASTA, PDB, Embl, GenBank, hssp, Pfam	GPL	link
Seaview	No	local Muscle/Clustalw	No	Dot-plot, vim-like editing keys	NEXUS, MSF, Clustal, FASTA, PHYLIP, MASE		link
Jalview 2	Secundary Struct. Prediction via JNET	Clustal, Muscle via web services	UPGMA, NJ	features from arbitrary DAS servers	FASTA, PFAM, MSF, Clustal, BLC, PIR	GPL	link
CLC viewer (Free version)	only in commercial version	Clustal, Muscle, T-Coffee, MAFFT, kalign, various	UPGMA, NJ	workflows, blast/genbank search	many	Freeware. More options available in commercial versions.	link, table of features
UGENE	No	MUSCLE	No	many	FASTA, GenBank, EMBL, ABIF, SCF, CLUSTALW, Stockholm, Newick, PDB	GPL	link
Mega	No	Native ClustalW	UPGMA, NJ, ME, MP, with bootstrap and confidence test	extended support to phylogenetics analysis	FASTA, Clustal, Nexus, Mega, etc..	Freeware, registration requested	link, table of features
Genedoc	No, but can read/show annotations	Pairwise	No, but can read/show annotations	gel simulation, stats, multiple views, simple	many	Free	link table of features
SeqPop	No					Free	link
BioEdit	No	ClustalW	rudimentary, can read phylip	plasmid drawing, ABI chromatograms,	Genbank, Fasta, Phylip 3.2, Phylip 4, NBRF/PIR	Free	link
Ale (emacs plugin)	No	Yes	No	No	GenBank, EMBL, Fast-A, and Phylip	GPL	link
Ralee (emacs plugin for RNA al. editing)		RNA structure			Stockholm	GPL	link
emacs - biomode							link

Some useful discussions on sequence alignment editors/viewers can be found here:

http://lists.open-bio.org/pipermail/emboss/2008-July/003324.html

Short-Read Sequence Alignment

Name	Description	Multi-threaded	License	Link
BFAST	Explicit time and accuracy tradeoff with a prior accuracy estimation, supported by indexing the reference sequences. Optimally compresses indexes. Can handle billions of short reads. Can handle insertions, deletions, SNPs, and color errors (can map ABI SOLiD color space reads). Performs a full Smith Waterman alignment.	Yes (POSIX)	Free for academic and non-commercial use.	link
MAQ	Ungapped alignment that takes into account quality scores for each base.		GPL	link
Novoalign	Gapped alignment of single end and paired end Illumina GA I & II reads. Uses base qualities. Reports can be converted to MAQ map files for analysis with MAQ utilities. Performance comparable to MAQ but with gapped alignment.	Multi-threading available with paid license.	Single threaded version free for academic and non-commercial use.	link
ELAND	Implemented by Illumina. Includes ungapped alignment with a finite read length.
SSAHA and SSAHA2	Fast for a small number of variants.		Free for academic and non-commercial use.	link
SOAP	Robust with a small (1-3) number of gaps and mismatches. Speed improvement over BLAT, uses a 12 letter hash table.		GPL	link
BLAT	Made by Jim Kent. De facto standard for nucleotide sequence alignment, ungapped alignment, and can handle one mismatch in initial alignment step.	Yes (client/server).	Free for academic and non-commercial use.	link
BLASTN	BLAST's nucleotide alignment program, slow and not accurate for short reads, and uses a sequence database (EST, sanger sequence) rather than a reference genome.			link
ZOOM	Reads can vary between 25bp and 64bp in length. Fast but not accurate for >2 mismatches. Ignores insertions and deletions.		Commercial	link
RMAP	Read lengths can range from 20bp to at most 64bp. Uses the "exclusion principle" to allow for mismatches and look-up reads in an index.			link
SHRiMP	Indexes the reads instead of the reference genome. Uses masks to generate possible keys. Can map ABI SOLiD color space reads.		BSD derivative	link
Bowtie	Uses a Burrows-Wheeler to create a permanent, reusable index of the genome; 1.3 GB memory footprint for human genome. Aligns more than 25 million Illumina reads in 1 CPU hour. Supports Maq-like and SOAP-like alignment policies.	Yes (POSIX)	Artistic License	link
QPalma	Uses known alignments to align targeted spliced reads. Useful for transcriptome resequencing and gene exploration.			link
SOCS	For ABI SOLiD technologies. Significant increase in time to map reads with mismatches (or color errors). Uses an iterative version of the Rabin-Karp string search algorithm.	Yes		link
MOSAIK	Fast and incorporates assembly. Aligns reads using a hashing scheme. Must split the reads many times to be robust against increasing number of mismatches.	Yes		link
SLIDER	Slider is an application for the Illumina Sequence Analyzer output that uses the "probability" files instead of the sequence files as an input for alignment to a reference sequence or a set of reference sequences.			link

External links

Pollard et al. (2004) (PubMed Central free fulltext): The authors discuss LAGAN, CHAOS, and Dialign as the most effective tools tested for certain uses.