Talk:Probabilistic latent semantic analysis
Appearance
![]() | Statistics Unassessed | |||||||||
|
Corrected a few inconsistencies/confusions/inaccuracies:
- afaik the acronym PLSA is more common that the lower-case variant pLSA -- need to be consistent anyway.
- Fisher kernels allow PLSA to be used in a discriminative setting, not as a generative model.
- Whoever wrote the part about "severe overfitting problems" should provide a reference for that.
- I stumbled upon a paper stating these overfitting problems and added the reference. —Preceding unsigned comment added by Keretapi (talk • contribs) 14:37, 17 September 2007 (UTC)
- In "Evolutions...", _discriminative_ was obviously wrong -- I think what was meant is _generative_ -- that's one way to present LDA.
- Added a bullet on the extension to higher-order data
Sunny house 20:00, 22 August 2007 (UTC)
- Excellent. Rama 08:38, 23 August 2007 (UTC)
Errr -- whoever added the graph: it's nice and everything but could you try to use the same notation as in the article? Sunny house (talk) 19:44, 11 March 2008 (UTC)
- No, in every paper i have read, the latent variable is always denoted as 'z'. So the text should be changed instead.--137.250.39.133 (talk) 09:21, 18 April 2008 (UTC)
- Dear 137.250.39.133: first the goal here is not necessarily to reproduce what you read in other papers, but to provide a self-contained explanation of PLSA. Whether the latent variable is denoted c or z is inconsequential as long as it is clear that it is a latent variable. However, the main issue with the graph is that it is confusing w.r.t. the document variable 'd', which is denoted by the theta in the graph. I doubt every paper you read uses this notation -- of the papers cited here, Hofmann, Vinokourov et al. and Gaussier et al. cerrtainly do not. Finally there is a captioning problem: the words are not the only observables, the document index is observed too (by definition). Sunny house (talk) 13:18, 5 July 2008 (UTC)