Hierarchical Dirichlet process

In statistics and Machine Learning, the hierarchical Dirichlet process (HDP) is a nonparametric Bayesian approach to clustering grouped data. It uses a Dirichlet process for each group of data, with the Dirichlet processes for all groups sharing a base distribution which is itself drawn from a Dirichlet process. This method allows groups to share statistical strength via sharing of clusters across groups. The base distribution being drawn from a Dirichlet process is important, because draws from a Dirichlet process are atomic probability measures, and the atoms will appear in all group-level Dirichlet processes. Since each atom corresponds to a cluster, clusters are shared across all groups. It was developed by Yee Whye Teh, Michael I. Jordan, David Blei and Matthew Beal and published in the Journal of the American Statistical Association in 2006.^[1]

Model

The HDP is a model for grouped data. What this means is that the data items come in multiple distinct groups. For example, in a topic model words are organized into documents, with each document formed by a bag (group) of words (data items). Indexing groups by $j=1,...J$ , suppose each group consist of data items $x_{j1},...x_{jn}$ .

The HDP is parameterized by a base distribution $H$ which governs the a priori distribution over data items, and a number of concentration parameters which govern the a priori number of clusters and amount of sharing across groups. The $j$ th group is associated with a random probability measure $G_{j}$ which has distribution given by a Dirichlet process: ${\begin{array}{lcl}G_{j}|G_{0}&\sim &\operatorname {DP} (\alpha _{j},G_{0})\end{array}}$ where Failed to parse (unknown function "\math"): {\displaystyle \alpha_j<\math> is the concentration parameter associated with the group, and <math>G_0} is the base distribution shared across all groups. In turn, the common base distribution is Dirichlet process distributed: ${\begin{array}{lcl}G_{0}&\sim &\operatorname {DP} (\alpha _{0},H)\end{array}}$ with concentration parameter $\alpha _{0}$ and base distribution $H$ . Finally, to relate the Dirichlet processes back with the observed data, each data item $x_{ji}$ is associated with a latent parameter $\theta _{ji}$ : ${\begin{array}{lcl}\theta _{ji}|G_{j}&\sim &G_{j}\\x_{ji}|\theta _{ji}&\sim &F(\theta _{ji})\end{array}}$ The first line states that each parameter has a prior distribution given by $G_{j}$ , while the second line states that each data item has a distribution $F(\theta _{ji})$ parameterized by its associated parameter. The resulting model above is called a HDP mixture model, with the HDP referring to the hierarchically linked set of Dirichlet processes, and the mixture model referring to the way the Dirichlet processes are related to the data items.

References

^ Teh, Y. W.; Jordan, M. I.; Beal, M. J.; Blei, D. M. (2006). "Hierarchical Dirichlet Processes" (PDF). Journal of the American Statistical Association. 101: pp. 1566–1581.

[teh2006-1] Teh, Y. W.; Jordan, M. I.; Beal, M. J.; Blei, D. M. (2006). "Hierarchical Dirichlet Processes" (PDF). Journal of the American Statistical Association. 101: pp. 1566–1581.

[1]