Talk:Probabilistic latent semantic analysis

	This article is within the scope of WikiProject Statistics, a collaborative effort to improve the coverage of statistics on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.StatisticsWikipedia:WikiProject StatisticsTemplate:WikiProject StatisticsStatistics
???	This article has not yet received a rating on Wikipedia's content assessment scale.
???	This article has not yet received a rating on the importance scale.

Corrected a few inconsistencies/confusions/inaccuracies:

afaik the acronym PLSA is more common that the lower-case variant pLSA -- need to be consistent anyway.
Fisher kernels allow PLSA to be used in a discriminative setting, not as a generative model.
Whoever wrote the part about "severe overfitting problems" should provide a reference for that.

I stumbled upon a paper stating these overfitting problems and added the reference. —Preceding unsigned comment added by Keretapi (talk • contribs) 14:37, 17 September 2007 (UTC)[reply]

In "Evolutions...", _discriminative_ was obviously wrong -- I think what was meant is _generative_ -- that's one way to present LDA.
Added a bullet on the extension to higher-order data

Sunny house 20:00, 22 August 2007 (UTC)[reply]

Excellent. Rama 08:38, 23 August 2007 (UTC)[reply]

Errr -- whoever added the graph: it's nice and everything but could you try to use the same notation as in the article? Sunny house (talk) 19:44, 11 March 2008 (UTC)[reply]

No, in every paper i have read, the latent variable is always denoted as 'z'. So the text should be changed instead.--137.250.39.133 (talk) 09:21, 18 April 2008 (UTC)[reply]

Dear 137.250.39.133: first the goal here is not necessarily to reproduce what you read in other papers, but to provide a self-contained explanation of PLSA. Whether the latent variable is denoted c or z is inconsequential as long as it is clear that it is a latent variable. However, the main issue with the graph is that it is confusing w.r.t. the document variable 'd', which is denoted by the theta in the graph. I doubt every paper you read uses this notation -- of the papers cited here, Hofmann, Vinokourov et al. and Gaussier et al. cerrtainly do not. Finally there is a captioning problem: the words are not the only observables, the document index is observed too (by definition). Sunny house (talk) 13:18, 5 July 2008 (UTC)[reply]