Wikipedia talk:WikiProject Probability
/Archive1 02:23, 18 August 2005 (UTC)
Specification
This section includes the discussion of the specification to help authors write the entries on the distributions while maintain a maximum amount of harmony between the different distributions. The actual specification will be presented on the main page.
Specifications of Standard Usage
Capitalization
We should decide on whether to capitize the names of the distributions when we refer to them in passing. e.g Do we talk about the Gamma distribution or the gamma distribution. I have been using capitalizations because it seems like a proper name. Acuster 07:54, 19 August 2005 (UTC)
- We generally have to go with what's most common in the relevant literature. If there are several equally common alternatives, I'm generally in favor of preserving the case of any conventional piece of notation that the distribution may be named for. I personally write "Gamma distribution", and "Beta distribution", but "chi-square distribution" and "zeta distribution", because the first two involve upper-case Greek letters and the last two lower-case Greek letters. For other distributions it's pretty clear: "F-distribution", "t-distribution"; "Cauchy distribution", "Wishart distribution" (based on proper names); "binomial distribution", "exponential distribution" (not based on proper names). I tend to hesitate when it comes to "normal distribution" vs. "Normal distribution". --MarkSweep 08:55, 19 August 2005 (UTC)
- Yes - see talk pages on Gamma distribution and chi-square distribution. PAR 09:14, 19 August 2005 (UTC)
Inline math
A question: how do we add inline math so it's elegant? I've tried adding it with 'math' tags but it looks goofy: e.g. . Acuster 08:01, 19 August 2005 (UTC)
- That's a complex issue. I'd say what you did was perfectly fine for inline math. If you prefer TeX's Computer Modern typeface for all math formulas, you can switch on mandatory PNG rendering in your user preferences. --MarkSweep 08:57, 19 August 2005 (UTC)
- We had a discussion on this, but I can't remember the page. (MarkSweep, do you remember the page?) I've tried to summarize the results in the section on inline math. PAR 09:14, 19 August 2005 (UTC)
Specification of a Standard Layout
Prototype Layouts and Contents for Reference
I'd suggest using the Normal distribution as our prototype of the continuous because it's the most complete and elegant page currently and possibly the most important distribution overall certainly it has the most text both here and on mathworld. Acuster 07:57, 19 August 2005 (UTC)
I'd additionally recommend the article on the exponential distribution, since it includes a discussion of Bayesian estimation. A discussion of (semi-)conjugate priors for the normal mean and variance/precision is currently missing from the article on the normal distribution. --MarkSweep 08:45, 19 August 2005 (UTC)
Working Layout
This section should be a working space for hashing out ideas on the layout and the specification on the previous page can be used when we have concensus. Acuster 07:05, 20 August 2005 (UTC)
- I think that the talk page is for carrying on a discussion and the project page is for laying out an editable list of specifications - MarkSweep, is that right? PAR 12:20, 20 August 2005 (UTC)
- It doesn't matter much to me: We can have a discussion about the specification first, and then move it to the project page later. --MarkSweep 21:06, 20 August 2005 (UTC)
== Overview == == ?Examples? == A section to provide the non-technical public with examples of the distribution and its use. Would such a section be useful and possible to create? == History == == Specification of the normal distribution == === Probability density function === === Cumulative distribution function === === Generating functions === ==== Moment generating function ==== ==== Characteristic function ==== == Properties == === Moments === === Generating normal random variables === === The central limit theorem === === Infinite divisibility === === Standard deviation === ==Related distributions== === ?Generalizations of the distribution? === This section would present and link to distributions which are more general forms of the distribution presented on the page. For the exponential, this would include the Erlang and Gamma. For the Gamma this would include the several "generalized gamma". == Occurrence == === E.g. 1 === === E.g. 2 === == Estimation of parameters == === Maximum likelihood estimation of parameters === === Unbiased estimation of parameters === === Bayesian estimation of parameters === == See also == == References== == External links ==
We need to make sure that we give enough background in the lead paragraphs and in the first couple of sections. I seem to recall that "Overview" sections are deprecated and that a summary and/or high-level overview should go into the lead paragraph (before the first section heading). --MarkSweep 21:06, 20 August 2005 (UTC)
Specification of the distribution: Notation for discrete PMFs
As I see it, there are basically three choices of notation for discrete probability mass functions. Consider a one-parameter family like the Zeta distribution, whose parameter is called s. We try to use k as the main argument:
- Vector notation (advocated by PAR) would write the probability as with I call this "vector notation" because can thought of as column vector (or even a stochastic matrix), and k indexes the kth component of that vector.
- Unary function notation (advocated, apparently, by Michael Hardy) would write the probability as with There are plenty of precedents for writing function parameters as subscripts, but the disadvantage is that this may become hard to read with several parameters.
- General function notation (advocated by yours truly) would write the probability as with This is exactly the same as the conventions currently used for continuous distributions, and it has the advantage that several parameters are easily accommodated.
The problem with options 1 and 2 is that they are easy to confuse. Option 3 is unambiguous. --MarkSweep 17:59, 18 August 2005 (UTC)
- Ok, after thinking about it, I like the third choice. Its similarity to the continuous notation is a plus, and rational number subscripts in the first choice worry me even though they are countable. The second is worse because it is hard to specify a particular instance using a real number as a subscript and they are not countable. PAR 18:32, 18 August 2005 (UTC)
- I made these changes, just to keep up to date.PAR 09:15, 19 August 2005 (UTC)
- I also prefer the third option with a caveat. I use pipe's to denote conditionals and semi-colons to denote parameters. So instead of because it's entirely possible to have a conditional with parameters (such as or , though I only recall doing such for Bayesian stuff). Cburnett 20:33, August 19, 2005 (UTC)