Jump to content

Document clustering

From Wikipedia, the free encyclopedia
This is an old revision of this page, as edited by Graeme Bartlett (talk | contribs) at 20:31, 11 December 2007 (contributed by 216.183.184.253 15:20, 25 September WP:AFC). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.
(diff) ← Previous revision | Latest revision (diff) | Newer revision → (diff)

Document clustering is closely related to concept of data clustering. Document clustering is a more specific technique for unsupervised document organization, automatic topic extraction and fast information retrieval or filtering. For example, a web search engine often returns thousands of pages in response to a broad query, making it difficult for users to browse or to identify relevant information. Clustering methods can be used to automatically group the retrieved documents into a list of meaningful categories, as is achieved by Enterprise Search engines such as Northern Light and Vivisimo.
Example:
FirstGov.gov, the official Web portal for the U.S. government uses document clustering to automatically organize its search results into categories. For example if a user submits “immigration” next to their list of results they will see categories for “Immigration Reform”, “Citizenship and Immigration Services”, “Employment”, “Department of Homeland Security”, and more.


Sources

http://www.lans.ece.utexas.edu/upload/comptext.pdf