Jump to content

Talk:Probabilistic latent semantic analysis

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia
This is an old revision of this page, as edited by G716 (talk | contribs) at 22:14, 10 August 2008 (+{{WPStatistics}} using AWB). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.
WikiProject iconStatistics Unassessed
WikiProject iconThis article is within the scope of WikiProject Statistics, a collaborative effort to improve the coverage of statistics on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.
???This article has not yet received a rating on Wikipedia's content assessment scale.
???This article has not yet received a rating on the importance scale.

Corrected a few inconsistencies/confusions/inaccuracies:

  • afaik the acronym PLSA is more common that the lower-case variant pLSA -- need to be consistent anyway.
  • Fisher kernels allow PLSA to be used in a discriminative setting, not as a generative model.
  • Whoever wrote the part about "severe overfitting problems" should provide a reference for that.
  • In "Evolutions...", _discriminative_ was obviously wrong -- I think what was meant is _generative_ -- that's one way to present LDA.
  • Added a bullet on the extension to higher-order data

Sunny house 20:00, 22 August 2007 (UTC)[reply]

Excellent. Rama 08:38, 23 August 2007 (UTC)[reply]

Errr -- whoever added the graph: it's nice and everything but could you try to use the same notation as in the article? Sunny house (talk) 19:44, 11 March 2008 (UTC)[reply]

No, in every paper i have read, the latent variable is always denoted as 'z'. So the text should be changed instead.--137.250.39.133 (talk) 09:21, 18 April 2008 (UTC)[reply]
Dear 137.250.39.133: first the goal here is not necessarily to reproduce what you read in other papers, but to provide a self-contained explanation of PLSA. Whether the latent variable is denoted c or z is inconsequential as long as it is clear that it is a latent variable. However, the main issue with the graph is that it is confusing w.r.t. the document variable 'd', which is denoted by the theta in the graph. I doubt every paper you read uses this notation -- of the papers cited here, Hofmann, Vinokourov et al. and Gaussier et al. cerrtainly do not. Finally there is a captioning problem: the words are not the only observables, the document index is observed too (by definition). Sunny house (talk) 13:18, 5 July 2008 (UTC)[reply]