Explicit semantic analysis
In natural language processing and information retrieval, explicit semantic analysis (ESA) is a vectorial representation of text (individual words or entire documents) that uses Wikipedia as a knowledge base. Specifically, in ESA, a word is represented as a column vector in the tf–idf matrix of Wikipedia's article text and a document (string of words) is represented as the centroid of the vectors representing its words.
ESA was designed by Evgeniy Gabrilovich and Shaul Markovitch as a means of improving text categorization[1] and has been used by this pair of researchers to compute what they refer to as "semantic relatedness" by means of cosine similarity between the aforementioned vectors, collectively interpreted as a space of "concepts explicitly defined and described by humans", where Wikipedia articles are equated with concepts; the name "explicit semantic analysis" contrasts with latent semantic analysis (LSA).[2]
See also
External links
- Explicit semantic analysis on Evgeniy Gabrilovich's homepage; has links to implementations
References
- ^ Evgeniy Gabrilovich and Shaul Markovitch (2006). Overcoming the brittleness bottleneck using Wikipedia: enhancing text categorization with encyclopedic knowledge. In Proc. 21st National Conference on AI (AAAI), pp. 1301-1306.
- ^ Evgeniy Gabrilovich and Shaul Markovitch (2007). Computing semantic relatedness using Wikipedia-based Explicit Semantic Analysis. Proc. 20th Int'l J. Conf. on AI (IJCAI).