The printable version is no longer supported and may have rendering errors. Please update your browser bookmarks and please use the default browser print function instead.

This article is rated Stub-class on Wikipedia's content assessment scale.
It is of interest to the following WikiProjects:

Linguistics: Applied Linguistics Low‑importance

	Linguistics portal This article is within the scope of WikiProject Linguistics, a collaborative effort to improve the coverage of linguistics on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.LinguisticsWikipedia:WikiProject LinguisticsTemplate:WikiProject LinguisticsLinguistics
Low	This article has been rated as Low-importance on the project's importance scale.
	This article is supported by Applied Linguistics Task Force.
	This article has been automatically rated by a bot or other tool as Stub-class because it uses a stub template. Please ensure the assessment is correct before removing the `\|auto=` parameter.

Comments

I need some help here:

Do you think I focused to much one vectors?

We definitely need more applications. Kh251

I don't agree with the last changes. Performing eigenvalue decomposition reduce the size of the matrix, thus improves speed, but decreases accuracy. I know I might be wrong, but I'd like to understand... KH251 09:32, 21 July 2005 (UTC)[reply]

Not necessarily: what you say is one valid interpretation of the reduction, but the reduction can also be interpreted as creating a "better" matrix, since the operation tends to "soften" the representation and reduce possible noise.

Also, it's not always true that this makes it easier on the computational side; for instance, LSA is rather heavier than just just leaving the thing alone (I have a reference for that somewhere, I am just rather busy at the moment...). Hope it helps ! Cheers ! Rama 12:14, 21 July 2005 (UTC)[reply]

Yes but LSA is computed once, the important part is having real time answers to queries. Once the matrix is smaller, this will be faster, won't it ? KH251 12:37, 21 July 2005 (UTC)[reply]

LSA produces a very serious computation burden on a search engine. Right now, if you type a word at a search engine, it looks the word up in a trie and finds documents that contain that word in O(1) time (independent of the number of documents in the collection). If you had a search engine that looked up documents in the LSA latent space, it would have to perform high-dimensional nearest neighbor search. LSA is typically used with 100+ dimensions, so none of the computational geometry speed-ups for nearest neighbor search apply. Therefore, the search would be O(N), where N is the number of documents in the collection. For Google, that would be 8,000,000,000. As you can see, this is disastrous for searching the web. -- hike395 06:14, July 22, 2005 (UTC)

Oh ! That's how ! Thank you very much for the explanation. You made my day. KH251 09:02, 22 July 2005 (UTC)[reply]

Since we seem to be several people to have a taste for the thing, would anyone fancy creating a "NLP project" on Wikipedia ? Rama 12:18, 22 July 2005 (UTC)[reply]

Intro Improvement Request

I encountered this term for the first time just a few minutes ago. I read the intro, but I still don't have a clear idea of what a document-term matrix is, other than it is a mathematical matrix and that it is related to a body of text. Danielx (talk) 01:42, 2 November 2009 (UTC)[reply]