Jump to content

Explicit semantic analysis

From Wikipedia, the free encyclopedia
This is an old revision of this page, as edited by Qwertyus (talk | contribs) at 14:43, 18 July 2012 (+cat). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

In natural language processing and information retrieval, explicit semantic analysis (ESA) is a vectorial representation of text (individual words or entire documents) that uses Wikipedia as a knowledge base. Specifically, in ESA, a word is represented as a column vector in the tf–idf matrix of Wikipedia's article text and a document (string of words) is represented as the centroid of the vectors representing its words.

ESA was designed by Evgeniy Gabrilovich and Shaul Markovitch as a means of improving text categorization[1] and has been used by this pair of researchers to compute what they refer to as "semantic relatedness" by means of cosine similarity between the aforementioned vectors, collectively interpreted as a space of "concepts explicitly defined and described by humans", where Wikipedia articles are equated with concepts; the name "explicit semantic analysis" contrasts with latent semantic analysis (LSA).[2]

See also

References

  1. ^ Evgeniy Gabrilovich and Shaul Markovitch (2006). Overcoming the brittleness bottleneck using Wikipedia: enhancing text categorization with encyclopedic knowledge. In Proc. 21st National Conference on AI (AAAI), pp. 1301-1306.
  2. ^ Evgeniy Gabrilovich and Shaul Markovitch (2007). Computing semantic relatedness using Wikipedia-based Explicit Semantic Analysis. Proc. 20th Int'l J. Conf. on AI (IJCAI).