Explicit semantic analysis

In natural language processing and information retrieval, explicit semantic analysis (ESA) is a vectorial representation of text (individual words or entire documents) that uses Wikipedia as a knowledge base. Specifically, in ESA, a word is represented as a column vector in the tf–idf matrix of Wikipedia's article text and a document (string of words) is represented as the centroid of the vectors representing its words.

ESA was designed by Evgeniy Gabrilovich and Shaul Markovitch as a means of improving text categorization^[1] and has been used by this pair of researchers to compute what they refer to as "semantic relatedness" by means of cosine similarity between the aforementioned vectors, collectively interpreted as a space of "concepts explicitly defined and described by humans", where Wikipedia articles are equated with concepts; the name "explicit semantic analysis" contrasts with latent semantic analysis (LSA).^[2]

External links

Explicit semantic analysis on Evgeniy Gabrilovich's homepage; has links to implementations

References

^ Evgeniy Gabrilovich and Shaul Markovitch (2006). Overcoming the brittleness bottleneck using Wikipedia: enhancing text categorization with encyclopedic knowledge. In Proc. 21st National Conference on AI (AAAI), pp. 1301-1306.
^ Evgeniy Gabrilovich and Shaul Markovitch (2007). Computing semantic relatedness using Wikipedia-based Explicit Semantic Analysis. Proc. 20th Int'l J. Conf. on AI (IJCAI).

This computing article is a stub. You can help Wikipedia by expanding it.

[1] Evgeniy Gabrilovich and Shaul Markovitch (2006). Overcoming the brittleness bottleneck using Wikipedia: enhancing text categorization with encyclopedic knowledge. In Proc. 21st National Conference on AI (AAAI), pp. 1301-1306.

[2] Evgeniy Gabrilovich and Shaul Markovitch (2007). Computing semantic relatedness using Wikipedia-based Explicit Semantic Analysis. Proc. 20th Int'l J. Conf. on AI (IJCAI).

[1]

[2]

See also

External links

References