Strong and weak sampling

Strong and weak sampling are two sampling approach^[1] in Statistics, and are popular in computational cognitive science and language learning^[2]. In strong sampling, it is assumed that the data are intentionally generated as positive examples of a concept^[3], while in weak sampling, it is assumed that the data are generated without any restrictions.^[4]

Formal Definition

In strong sampling, we assume observation is randomly sampled from the true hypothesis:

$P(x|h)={\begin{cases}{\frac {1}{|h|}}&{\text{, if }}x\in h\\0&{\text{, otherwise}}\end{cases}}$

In weak sampling, we assume observations randomly sampled and then classified:

$P(x|h)={\begin{cases}1&{\text{, if }}x\in h\\0&{\text{, otherwise}}\end{cases}}$

Consequence: Posterior computation under Weak Sampling

$P(h|x)={\frac {P(x|h)P(h)}{\sum \limits _{h}P(x|h)P(h)}}={\begin{cases}{\frac {P(h)}{\sum \limits _{h:x\in h}P(h)}}&{\text{, if }}x\in h\\0&{\text{, otherwise}}\end{cases}}$