Jump to content

Talk:HITS algorithm

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia
This is an old revision of this page, as edited by Macrakis (talk | contribs) at 23:58, 6 February 2010. The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

"It is executed at query time, and not at indexing time, with the associated hit on performance that accompanies query-time processing"

The algorithm can also be carried out in a transient manner like Google. Is this a difference at all? (unsigned comment by User:59.95.4.160 2007-05-03T08:24:09)

Perhaps the article isn't clear. PageRank is a query-independent calculation over the entire crawl which can be performed in batch mode. The ranking of results for a particular query is a function of the page's PageRank (which is independent of the query) and various query-dependent measures such as TFIDF. HITS is performed after a set of pages has been selected using TFIDF or whatever, and works on the link structure within that set, calculating the "authority" and "hub" score relative to the query; something that is an authority for baseball is unlikely to be an authority for fettuccine. You could of course run HITS on the whole crawl, or PageRank on a subset, but that is not how they are designed to be used. --Macrakis 13:14, 3 May 2007 (UTC)[reply]

The article begins with "In the HITS algorithm, the first step is to retrieve the set of results to the search query. The computation is performed only on this result set, not across all Web pages." I think it should be clarified how the "set of results" is obtained, e.g. via TFIDF, or another metric. Beamishboy (talk) 22:26, 6 February 2010 (UTC)[reply]

That is not specified by the HITS algorithm. You can apply HITS to any set of pages. But that passage should be reworded. --macrakis (talk) 23:58, 6 February 2010 (UTC)[reply]