Jump to content

List of sequence alignment software

From Wikipedia, the free encyclopedia
This is an old revision of this page, as edited by ProtectionTaggingBot (talk | contribs) at 11:36, 20 January 2009 (Remove expired protection tag.). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

This list of sequence alignment software is a compilation of software tools and web portals used in pairwise sequence alignment and multiple sequence alignment. See structural alignment software for structural alignment of proteins.

Database search only

Name Description Sequence Type* Link Author Year
BLAST k-tuple local search (Basic Local Alignment Search Tool) Both NCBI EBI DDBJ DDBJ (psi-blast) GenomeNet PIR (protein only) Myers E, Altschul SF, Gish W, Miller EW, Lipman DJ, NCBI 1990
Combinatorial Extension Structural alignment search Protein server Lipman DJ, Pearson WR 1985
FASTA k-tuple local search Both EBI DDBJ GenomeNet PIR (protein only)
GGSEARCH / GLSEARCH Global:Global (GG), Global:Local (GL) alignment with statistics Protein FASTA server
HMMER Hidden Markov profile search Protein/DNA download DDBJ (HMMPFAM) Eddy SR, Krogh A, Mitchison G 1998
IDF Inverse Document Frequency Both download
Infernal profile SCFG search RNA download Eddy S
SAM Hidden Markov profile search Protein/DNA SAM Karplus K, Krogh A
SSEARCH Smith-Waterman search (more sensitive than FASTA) Both EBI DDBJ server
*Sequence Type: Protein or nucleotide

Pairwise alignment

Name Description Sequence Type* Alignment Type** Link Author Year
Bioconductor Biostrings::pairwiseAlignment Dynamic programming Both Both + Ends-free site P. Aboyoun 2008
BioPerl dpAlign Dynamic programming Both Both + Ends-free site Y. M. Chan 2003
BLASTZ Seeded pattern-matching Nucleotide Local download Schwartz et al. 2003
DNADot Web-based dot-plot tool Nucleotide Global server R. Bowen 1998
DOTLET Java-based dot-plot tool Both Global applet M. Pagni and T. Junier 1998
GGSEARCH, GLSEARCH Global:Global (GG), Global:Local (GL) alignment with statistics Protein Global in query FASTA server W. Pearson 2007
JAligner Open source Java implementation of Smith-Waterman Both Local JWS A. Moustafa 2005
LALIGN Multiple, non-overlapping, local similarity (same algorithm as SIM) Both Local non-overlapping server FASTA server W. Pearson 1991 (algorithm)
matcher Memory-optimized needleman but slow dynamic programming (based on LALIGN) Both Local server I. Longden (modified from W. Pearson) 1999
MCALIGN2 explicit models of indel evolution DNA Global server J. Wang et al. 2006
MUMmer Suffix-Tree based Nucleotide Global download S. Kurtz et al. 2004
needle Needleman-Wunsch dynamic programming Both Global EBIserver A. Bleasby 1999
Ngila logarithmic and affine gap costs and explicit models of indel evolution Both Global download R. Cartwright 2007
PatternHunter Seeded pattern-matching Nucleotide Local download B. Ma et al. 2002-2004
ProbA (also propA) Stochastic partition function sampling via dynamic programming Both Global download U. Mückstein 2002
PyMOL "align" command aligns sequence & applies it to structure Protein Global (by selection) site W. L. DeLano 2007
REPuter Suffix-Tree based Nucleotide Local download S. Kurtz et al. 2001
SEQALN Various dynamic programming Both Local or Global server M.S. Waterman and P. Hardy 1996
SIM, GAP, NAP, LAP Local similarity with varying gap treatments Both Local or global server X. Huang and W. Miller 1990-6
SIM Local similarity Both Local servers X. Huang and W. Miller 1991
SLIM Search Ultra-fast blocked alignment Both Both site L. Bloksberg 2004
SSEARCH Local (Smith-Waterman) alignment with statistics Protein Local EBI FASTA server W. Pearson 1981 (Algorithm)
SWIFT suit Fast Local Alignment Searching DNA Local site K. Rasmussen, W. Gerlach 2005,2008
stretcher Memory-optimized but slow dynamic programming Both Global server I. Longden (modified from G. Myers and W. Miller) 1999
tranalign Aligns nucleic acid sequences given a protein alignment Nucleotide NA server G. Williams (modified from B. Pearson) 2002
water Smith-Waterman dynamic programming Both Local EBIPasteur server A. Bleasby 1999
wordmatch k-tuple pairwise match Both NA server I. Longden 1998
YASS Seeded pattern-matching Nucleotide Local server download L. Noe and G. Kucherov 2003-2007
*Sequence Type: Protein or nucleotide. **Alignment Type: Local or global

Multiple sequence alignment

Name Description Sequence Type* Alignment Type** Link Author Year
ABA A-Bruijn alignment Protein Global download B.Raphaelet al. 2004
ALE manual alignment ; some software assistance Nucleotides Local download J. Blandy and K. Fogel 1994 (latest version 2007)
AMAP Sequence annealing Both Global server A. Schwartz and L. Pachter 2006
BAli-Phy Tree+Multi alignment ; Probabilistic/Bayesian ; Joint Estimation Both Global WWW+download BD Redelings and MA Suchard 2005 (latest version 2007)
CHAOS/DIALIGN Iterative alignment Both Local (preferred) server M. Brudno and B. Morgenstern 2003
ClustalW Progressive alignment Both Local or Global download EBI DDBJ PBIL EMBNet GenomeNet Thompson et al. 1994
CodonCode Aligner Multi alignment; ClustalW & Phrap support Nucleotides Local or Global download P. Richterich et al. 2003 (latest version 2008)
DIALIGN-TX and DIALIGN-T Segment-based method Both Local (preferred) or Global download and server A.R.Subramanian 2005 (latest version 2008)
DNA Alignment Segment-based method for intraspecific alignments Both Local (preferred) or Global server A.Roehl 2005 (latest version 2008)
Ed'Nimbus Seeded filtration Nucleotides Local server P. Peterlongo et al. 2006
FSA Sequence annealing Both Global download and server R. K. Bradley et al. 2008
Geneious Progressive/Iterative alignment; ClustalW plugin Both Local or Global download A.J. Drummond et al. 2005 (latest version 2008)
Kalign Progressive alignment Both Global serverEBI MPItoolkit T. Lassmann 2005
MSA Dynamic programming Both Local or Global download D.J. Lipman et al. 1989 (modified 1995)
PRRN/PRRP Iterative alignment (especially refinement) Protein Local or Global PRRP PRRN Y. Totoki (based on O. Gotoh) 1991 and later
POA Partial order/hidden Markov model Protein Local or Global download C. Lee 2002
SAM Hidden Markov model Protein Local or Global server A. Krogh et al. 1994 (most recent version 2002)
MAFFT Progressive/iterative alignment Both Local or Global GenomeNet MAFFT K. Katoh et al. 2005
MAVID Progressive alignment Both Global server N. Bray and L. Pachter 2004
MULTALIN Dynamic programming/clustering Both Local or Global server F. Corpet 1988
Multi-LAGAN Progressive dynamic programming alignment Both Global server M. Brudno et al. 2003
MUSCLE Progressive/iterative alignment Both Local or Global server R. Edgar 2004
Pecan Probabilistic/consistency DNA Global download B. Paten et al. 2008
ProbCons Probabilistic/consistency Protein Local or Global server C. Do et al. 2005
PSAlign Alignment preserving non-heuristic Both Local or Global download S.H. Sze, Y. Lu, Q. Yang. 2006
SAGA Sequence alignment by genetic algorithm Protein Local or Global download C. Notredame et al. 1996 (new version 1998)
T-Coffee More sensitive progressive alignment Both Local or Global server download C. Notredame et al. 2000 (newest version 2008)
RevTrans Combines DNA and Protein alignment, by back translating the protein alignment to DNA. DNA/Protein (special) Local or Global server Wernersson and Pedersen 2003 (newest version 2005)
*Sequence Type: Protein or nucleotide. **Alignment Type: Local or global

Genomics analysis

Name Description Sequence Type* Link
SLAM Gene finding, alignment, annotation (human-mouse homology identification) Nucleotide server
Mauve Multiple alignment of rearranged genomes Nucleotide download
MGA Multiple Genome Aligner Nucleotide download
Mulan Local multiple alignments of genome-length sequences Nucleotide server
Sequerome Profiling sequence alignment data with major servers/services Nucleotide/peptide server
AVID Pairwise global alignment with whole genomes Nucleotide server
SIBsim4 / Sim4 A program designed to align an expressed DNA sequence with a genomic sequence, allowing for introns Nucleotide download
Shuffle-LAGAN Pairwise glocal alignment of completed genome regions Nucleotide server
ACT (Artemis Comparison Tool) Synteny and comparative genomics Nucleotide server
*Sequence Type: Protein or nucleotide



Motif finding

Name Description Sequence Type* Link
MEME/MAST Motif discovery and search Both server
BLOCKS Ungapped motif identification from BLOCKS database Both server
eMOTIF Extraction and identification of shorter motifs Both servers
Gibbs motif sampler Stochastic motif extraction by statistical likelihood Both server (one of many implementations)
TEIRESIAS Motif extraction and database search Both server
PRATT Pattern generation for use with ScanProsite Protein server
ScanProsite Motif database search tool Protein server
PHI-Blast Motif search and alignment tool Both server
I-sites Local structure motif library Protein server
*Sequence Type: Protein or nucleotide



Benchmarking

Name Link Authors
BAliBASE download Thompson, Plewniak, Poch
HOMSTRAD download Stebbings, Mizuguchi
Oxbench download Raghava, Searle, Audley, Barber, Barton
PFAM download
PREFAB download Edgar
SABmark download Van Walle, Lasters, Wyns
SMART download Letunic, Copley, Schmidt, Ciccarelli, Doerks, Schultz, Ponting, Bork

Alignment Viewers/Editors

Name Integrated with Struct. Prediction Tools Can Align Sequences Can Calculate Phylogenetic Trees Other Features Formats Supported License Link
DnaSP can compute several population genetics statistics, reconstruct haplotypes with PHASE FASTA, NEXUS, Mega, PHYLIP Freeware link
Strap Jnet, NNPREDICT, Coiled coil, 16 different TM-helix Fifteen different Methods Neighbor-joining Dot-plot, Structure-neighbors, 3D-superposition, Blast-search, Mutation/SNP analysis, Sequence features, Biojava-interface MSF, Stockholm, Clustalw, Nexus, FASTA, PDB, Embl, GenBank, hssp, Pfam GPL link
Seaview No local Muscle/Clustalw No Dot-plot, vim-like editing keys NEXUS, MSF, Clustal, FASTA, PHYLIP, MASE link
Jalview 2 Secundary Struct. Prediction via JNET Clustal, Muscle via web services UPGMA, NJ features from arbitrary DAS servers FASTA, PFAM, MSF, Clustal, BLC, PIR GPL link
CLC viewer (Free version) only in commercial version Clustal, Muscle, T-Coffee, MAFFT, kalign, various UPGMA, NJ workflows, blast/genbank search many Freeware. More options available in commercial versions. link, table of features
UGENE No MUSCLE No many FASTA, GenBank, EMBL, ABIF, SCF, CLUSTALW, Stockholm, Newick, PDB GPL link
Mega No Native ClustalW UPGMA, NJ, ME, MP, with bootstrap and confidence test extended support to phylogenetics analysis FASTA, Clustal, Nexus, Mega, etc.. Freeware, registration requested link, table of features
Genedoc No, but can read/show annotations Pairwise No, but can read/show annotations gel simulation, stats, multiple views, simple many Free link table of features
SeqPop No Free link
BioEdit No ClustalW rudimentary, can read phylip plasmid drawing, ABI chromatograms, Genbank, Fasta, Phylip 3.2, Phylip 4, NBRF/PIR Free link
Ale (emacs plugin) No Yes No No GenBank, EMBL, Fast-A, and Phylip GPL link
Ralee (emacs plugin for RNA al. editing) RNA structure Stockholm GPL link
emacs - biomode link

Some useful discussions on sequence alignment editors/viewers can be found here:

Short-Read Sequence Alignment

Name Description Multi-threaded License Link
BFAST Explicit time and accuracy tradeoff with a prior accuracy estimation, supported by indexing the reference sequences. Optimally compresses indexes. Can handle billions of short reads. Can handle insertions, deletions, SNPs, and color errors (can map ABI SOLiD color space reads). Performs a full Smith Waterman alignment. Yes (POSIX) Free for academic and non-commercial use. link
MAQ Ungapped alignment that takes into account quality scores for each base. GPL link
Novoalign Gapped alignment of single end and paired end Illumina GA I & II reads. Uses base qualities. Reports can be converted to MAQ map files for analysis with MAQ utilities. Performance comparable to MAQ but with gapped alignment. Multi-threading available with paid license. Single threaded version free for academic and non-commercial use. link
ELAND Implemented by Illumina. Includes ungapped alignment with a finite read length.
SSAHA and SSAHA2 Fast for a small number of variants. Free for academic and non-commercial use. link
SOAP Robust with a small (1-3) number of gaps and mismatches. Speed improvement over BLAT, uses a 12 letter hash table. GPL link
BLAT Made by Jim Kent. De facto standard for nucleotide sequence alignment, ungapped alignment, and can handle one mismatch in initial alignment step. Yes (client/server). Free for academic and non-commercial use. link
BLASTN BLAST's nucleotide alignment program, slow and not accurate for short reads, and uses a sequence database (EST, sanger sequence) rather than a reference genome. link
ZOOM Reads can vary between 25bp and 64bp in length. Fast but not accurate for >2 mismatches. Ignores insertions and deletions. Commercial link
RMAP Read lengths can range from 20bp to at most 64bp. Uses the "exclusion principle" to allow for mismatches and look-up reads in an index. link
SHRiMP Indexes the reads instead of the reference genome. Uses masks to generate possible keys. Can map ABI SOLiD color space reads. BSD derivative link
Bowtie Uses a Burrows-Wheeler to create a permanent, reusable index of the genome; 1.3 GB memory footprint for human genome. Aligns more than 25 million Illumina reads in 1 CPU hour. Supports Maq-like and SOAP-like alignment policies. Yes (POSIX) Artistic License link
QPalma Uses known alignments to align targeted spliced reads. Useful for transcriptome resequencing and gene exploration. link
SOCS For ABI SOLiD technologies. Significant increase in time to map reads with mismatches (or color errors). Uses an iterative version of the Rabin-Karp string search algorithm. Yes link
MOSAIK Fast and incorporates assembly. Aligns reads using a hashing scheme. Must split the reads many times to be robust against increasing number of mismatches. Yes link
SLIDER Slider is an application for the Illumina Sequence Analyzer output that uses the "probability" files instead of the sequence files as an input for alignment to a reference sequence or a set of reference sequences. link