Jump to content

Talk:Document-term matrix

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia
The printable version is no longer supported and may have rendering errors. Please update your browser bookmarks and please use the default browser print function instead.

Comments

I need some help here:

Do you think I focused to much one vectors?

We definitely need more applications. Kh251

I don't agree with the last changes. Performing eigenvalue decomposition reduce the size of the matrix, thus improves speed, but decreases accuracy. I know I might be wrong, but I'd like to understand... KH251 09:32, 21 July 2005 (UTC)[reply]

Not necessarily: what you say is one valid interpretation of the reduction, but the reduction can also be interpreted as creating a "better" matrix, since the operation tends to "soften" the representation and reduce possible noise.
Also, it's not always true that this makes it easier on the computational side; for instance, LSA is rather heavier than just just leaving the thing alone (I have a reference for that somewhere, I am just rather busy at the moment...). Hope it helps ! Cheers ! Rama 12:14, 21 July 2005 (UTC)[reply]
Yes but LSA is computed once, the important part is having real time answers to queries. Once the matrix is smaller, this will be faster, won't it ? KH251 12:37, 21 July 2005 (UTC)[reply]
LSA produces a very serious computation burden on a search engine. Right now, if you type a word at a search engine, it looks the word up in a trie and finds documents that contain that word in O(1) time (independent of the number of documents in the collection). If you had a search engine that looked up documents in the LSA latent space, it would have to perform high-dimensional nearest neighbor search. LSA is typically used with 100+ dimensions, so none of the computational geometry speed-ups for nearest neighbor search apply. Therefore, the search would be O(N), where N is the number of documents in the collection. For Google, that would be 8,000,000,000. As you can see, this is disastrous for searching the web. -- hike395 06:14, July 22, 2005 (UTC)
Oh ! That's how ! Thank you very much for the explanation. You made my day. KH251 09:02, 22 July 2005 (UTC)[reply]

Since we seem to be several people to have a taste for the thing, would anyone fancy creating a "NLP project" on Wikipedia ? Rama 12:18, 22 July 2005 (UTC)[reply]

Intro Improvement Request

I encountered this term for the first time just a few minutes ago. I read the intro, but I still don't have a clear idea of what a document-term matrix is, other than it is a mathematical matrix and that it is related to a body of text. Danielx (talk) 01:42, 2 November 2009 (UTC)[reply]