Jump to content

User:Van Parunak/sandbox

From Wikipedia, the free encyclopedia

Grammar-based distances refer to a family of String metrics that take into account the grammars that generate the strings being compared. Most string measures simply count the changes in characters needed to transform one string into another (Edit distance), or even more generally, the number of shared and distinct characters (e.g., Overlap coefficient). The fundamental insight of grammar-based distance is that if two strings differ at a character that is generated by the same grammatical production, the strings should be reckoned closer to one another than if the distinguishing character arises from higher-level productions.

Most work on grammar-based distances is based on inducing the grammar underlying a set of strings by applying Lempel-Ziv compression, which effectively reconstructs a production sequence yielding the strings. This approach is increasingly popular in biological settings [1] [2]. An alternative approach is based on knowing the underlying grammar of the strings in advance.

Distance Measures from Induced Grammars

[edit]

Distance Measures from Explicit Grammars

[edit]

See also

[edit]

References

[edit]
  1. ^ Otu, Hasan; Sayood, Khalid (2003). "A new sequence distance measure for phylogenetic tree construction" (PDF). Bioinformatics. 19 (16): 2122-2130. doi:10.1093/bioinformatics/btg295. Retrieved 19 July 2017.
  2. ^ Russell, David J.; Otu, Hasan H.; Sayood, Khalid (2008). "Grammar-based distance in progressive multiple sequence alignment". BMC Bioinformatics. 9. doi:10.1186/1471-2105-9-306. Retrieved 19 July 2017.{{cite journal}}: CS1 maint: unflagged free DOI (link)
[edit]

Category:Information retrieval evaluation Category:String similarity measures Category:Measure theory Category:Similarity and distance measures