Jump to content

Talk:Probabilistic latent semantic analysis

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia
This is an old revision of this page, as edited by 89.155.51.158 (talk) at 16:18, 12 August 2011. The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.
WikiProject iconStatistics Unassessed
WikiProject iconThis article is within the scope of WikiProject Statistics, a collaborative effort to improve the coverage of statistics on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.
???This article has not yet received a rating on Wikipedia's content assessment scale.
???This article has not yet received a rating on the importance scale.

Corrected a few inconsistencies/confusions/inaccuracies:

  • afaik the acronym PLSA is more common that the lower-case variant pLSA -- need to be consistent anyway.
  • Fisher kernels allow PLSA to be used in a discriminative setting, not as a generative model.
  • Whoever wrote the part about "severe overfitting problems" should provide a reference for that.
  • In "Evolutions...", _discriminative_ was obviously wrong -- I think what was meant is _generative_ -- that's one way to present LDA.
  • Added a bullet on the extension to higher-order data

Sunny house 20:00, 22 August 2007 (UTC)[reply]

Excellent. Rama 08:38, 23 August 2007 (UTC)[reply]

Errr -- whoever added the graph: it's nice and everything but could you try to use the same notation as in the article? Sunny house (talk) 19:44, 11 March 2008 (UTC)[reply]

No, in every paper i have read, the latent variable is always denoted as 'z'. So the text should be changed instead.--137.250.39.133 (talk) 09:21, 18 April 2008 (UTC)[reply]
Dear 137.250.39.133: first the goal here is not necessarily to reproduce what you read in other papers, but to provide a self-contained explanation of PLSA. Whether the latent variable is denoted c or z is inconsequential as long as it is clear that it is a latent variable. However, the main issue with the graph is that it is confusing w.r.t. the document variable 'd', which is denoted by the theta in the graph. I doubt every paper you read uses this notation -- of the papers cited here, Hofmann, Vinokourov et al. and Gaussier et al. cerrtainly do not. Finally there is a captioning problem: the words are not the only observables, the document index is observed too (by definition). Sunny house (talk) 13:18, 5 July 2008 (UTC)[reply]
Actually Hofmann, in its original paper "Probabilistic Latent Semantic Analysis" uses "d" for the document variable, "z" for the topic and "w" for the observed word. However, this is by no means important, and several other works approach both the plate notation as the formulas using diverse letters for the variables. It is in fact more common to see "z" as the topic, but this should not be taken as a rule. For clarity, both the text and the image should have the same letters. Paulo Gaspar.