Jump to content

Talk:Multiple sequence alignment

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia
This is an old revision of this page, as edited by Jezhotwells (talk | contribs) at 18:51, 7 March 2010 (keep ga status). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.
Good articleMultiple sequence alignment has been listed as one of the Natural sciences good articles under the good article criteria. If you can improve it further, please do so. If it no longer meets these criteria, you can reassess it.
Article milestones
DateProcessResult
September 12, 2006Good article nomineeListed
March 7, 2010Good article reassessmentKept
Current status: Good article

Template:Wikiproject MCB

some clarifications

the statement

Because HMMs are probabilistic, they do not produce the same solution every time they are run on the same dataset; thus they cannot be guaranteed to converge to an optimal alignment. HMMs can produce both global and local alignments. Although HMM-based methods have been developed relatively recently, they offer significant improvements in computational speed, especially for sequences that contain overlapping regions.

is incorrect. HMMs are probablistic in the sense that they are a statistical model, however, they are completely deterministic and will produce the same result every time on a given dataset. HMM alignments use the same algorithms as local sequence alignments and therefore have no computational speed advantage.

One of the most common motif-finding tools, known as MEME, uses expectation maximization and hidden Markov methods to generate motifs that are then used as search tools by its companion MAST in the combined suite MEME/MAST.[19][20]

MEME uses a PSSM (position specific scoring matrix), but does not contain insertion or deletion probabilities or other characteristics of a typical sequence HMM.

Gribskov 03:55, 20 September 2007 (UTC)[reply]

Given the specific technological and algorithmic and biological significance of short-read sequence alignment, I think this topic deserves its own page. For example, the differences between short read mapping and de-novo assembly in next-generation sequencing projects could be discussed on such a page. --Dan|(talk) 14:01, 22 January 2009 (UTC)[reply]

Alternative interpretations of MSAs

The main use/interpretation of columns in MSAs is that residues in the same column are "related" by either point substitutions or no substitutions at all.

However, there are applications of MSAs where residues in the same column are assumed to be "structurally" equivalent but not necessarily evolutionarily equivalent e.g. http://www.ncbi.nlm.nih.gov/pubmed/16733545 - indeed in some of these applications the aim is to avoid including "homologous" sequences in the alignment e.g. http://www.ncbi.nlm.nih.gov/pubmed/9920390

At the moment this distinction isn't made on the MSA wikipedia page - although the top of the sequence alignment wikipedia page does highlight different interpretations.

First wikipedia post ever here - not quite ready to be bold yet! - so wanted to ask/check whether anyone disagrees with introducing some changes to reflect this distinction to the MSA page? SiggyDood (talk) 12:45, 11 March 2009 (UTC)[reply]

GA Reassessment

This discussion is transcluded from Talk:Multiple sequence alignment/GA1. The edit link for this section can be used to add comments to the reassessment.

Starting GA reassessment as part of the GA Sweeps process. Jezhotwells (talk) 19:18, 28 February 2010 (UTC)[reply]

Checking against GA criteria

GA review (see here for criteria)
  1. It is reasonably well written.
    a (prose): b (MoS):
    This article appears more like an essay or a paper than an encyclopaedia artcile. Consider a thorough copy-edit for style and clarity.  Done
  2. It is factually accurate and verifiable.
    a (references): b (citations to reliable sources): c (OR):
    I repaired dead links using WP:CHECKLINKS. All references appear to be OK  Done
    There are many uncited paragraphs.  Done
  3. It is broad in its coverage.
    a (major aspects): b (focused):
  4. It follows the neutral point of view policy.
    Fair representation without bias:
  5. It is stable.
    No edit wars, etc.:
  6. It is illustrated by images, where possible and appropriate.
    a (images are tagged and non-free images have fair use rationales): b (appropriate use with suitable captions):
    Images such as File:Caspase-motif-alignment.png and File:RPLP0 90 ClustalW aln.gif are illegible in the article and appear to add little.  Done
  7. Overall:
    Pass/Fail:
    Main concerns: the style of the article is un-encyclopaedic, images add little, many uncited paragraphs. On hold until 7 March, major contributors and projects will be notified. Jezhotwells (talk) 19:29, 28 February 2010 (UTC)[reply]
    OK, thanks for for fixing things up, keep GA status. Jezhotwells (talk) 18:49, 7 March 2010 (UTC)[reply]

Note

Most of the unencylopedic style was inserted by a well-meaning but seemingly novice editor. I've removed much of that content because it was redundant with other, cited parts of the article. I've also significantly enlarged the lead image. Without such detail, the image is useless to the reader unless he or she clicks through to the larger media file -- which is unlikely. All paragraphs now have at least one relevant reference. If are any issues that remain to be addressed for the purposes of this reassessment, please let me know. Thanks, Emw (talk) 08:10, 6 March 2010 (UTC)[reply]

Hi, making the image larger doesn't really address the problem which is that the image does not convey any information. Please see WP:MOS#Avoid entering textual information as images and WP:MOS#Images which suggest no larger than 300px for lead images. Jezhotwells (talk) 14:51, 6 March 2010 (UTC)[reply]
Images portraying multiple sequence alignments seem like a valid exception to the guideline discouraging the use of textual information in images. Presumably that guideline pertains to uses of natural language in images. In contrast, the text used in the two images in this article represent sequences of amino acids. It is the convention used to represent MSAs among reliable sources (i.e., textbooks, journal articles, reliable websites). Also, I think the lead image's detail necessitates its larger-than-usual dimensions. Such exceptions are provided for in the MOS: "Images containing important detail (for example, a map, diagram, or chart), and which may need larger sizes than usual." Emw (talk) 15:23, 6 March 2010 (UTC)[reply]
OK, I'll buy that, Another possibility, which I ask tou to consider is moving the image elsewhere in teh artcile so that it doesn't sandwich the lead. Jezhotwells (talk) 18:49, 7 March 2010 (UTC)[reply]