Bayesian Averages Under a Dirichlet Distribution

Below we derive the modified Bayesian average of a five star rating system, under the assumption that each rating comes from a dirichlet distribution.

For companion guides, and Python code which implements some of these ideas, please see:

Let $X$ be a categorical random variable such that $X \in \{ 1, 2, 3, 4, 5 \}$ . With $X_{i} = i$ $\forall$ $i \in \{ 1, 2, 3, 4, 5 \}$ . Further, let $O$ represent a sequence of independent observations such that $O = (o_{1}, ..., o_{N})$

Further, let $K_{i}$ represent the number of observations for each of the categories enumerate above and let $p_{i}$ represent the probability of observing each categorical random variable value. Then our Bayesian relationship is given by:

$P(p_{1}, p_{2}, p_{3}, p_{4}, p_{5}|O) \propto P(O|p_{1}, p_{2}, p_{3}, p_{4}, p_{5})P(p_{1}, p_{2}, p_{3}, p_{4}, p_{5})$

Assume that the prior probability is characterized by the following distribution:

$P(O|p_{1}, p_{2}, p_{3}, p_{4}, p_{5}) = p_{1}^{K_{1}}p_{2}^{K_{2}}p_{3}^{K_{3}}p_{4}^{K_{4}}p_{5}^{K_{5}}$

Assume that the posterior probability also follows a Dirichlet distribution (i.e. the probabilities of each categorical variable are drawn from a Dirichlet distribution) such that:

$P(p_{1}, p_{2}, p_{3}, p_{4}, p_{5}|O) = P(O|p_{1}, p_{2}, p_{3}, p_{4}, p_{5})P(p_{1}, p_{2}, p_{3}, p_{4}, p_{5}| \vec{\alpha})$

$\propto p_{1}^{K_{1}+\alpha_{1}^{0} - 1}p_{2}^{K_{2} +\alpha_{1}^{0} - 1}p_{3}^{K_{3} +\alpha_{1}^{0} - 1}p_{4}^{K_{4} +\alpha_{1}^{0} - 1}p_{5}^{K_{5} +\alpha_{1}^{0} - 1}$

This distribution is parametrized by the following vector:

$\vec{\gamma} = [K_{1}+\alpha_{1}^{0} \:\:\: K_{2} +\alpha_{2}^{0} \:\:\: K_{3} +\alpha_{3}^{0} \:\:\: K_{4} +\alpha_{4}^{0} \:\:\: K_{5} +\alpha_{5}^{0}]'$

Recall that the pdf and first moment for the Dirichlet distribution are given by:

$\frac{1}{B(\vec{\gamma})} \prod_{i=1}^{3} p_{i}^{\alpha_{i} - 1}$

$E[p_{i}] = \frac{\alpha_{i}}{\sum_{j=1}^{3} \alpha_{j}}$

Under the Dirichlet distribution, the expected probability of each categorical value is given by the first moment above. Therefore, we can write the expected probability of each categorical value, conditioned on the total set of observations, as:

(1) $E[p_{1} + 2*p_{2} + 3*p_{3} + 4 * p_{4} + 5 * p_{5}|O] = \sum_{i=1}^{5} i * E[p_{i} | O]$

(2) $E[p_{i}|O] = \frac{\gamma_{i}}{\sum_{j=1}^{5}\gamma_{j}}$

Plugging (2) into (1) we have:

(3) $E[p_{1} + 2*p_{2} + 3*p_{3} + 4 * p_{4} + 5 * p_{5}|O]= \frac{\psi + \sum_{i=1}^{5} iK_{i}}{N + \sum_{j=1}^{5}\alpha_{j}^{0}}$

Where, $\psi = \sum_{i=1}^{5}i\alpha_{i}^{0}$ and $N = \sum_{i=1}^{5}K_{j}$

More intuitively, we can express (3) as:

$\frac{\psi + \text{Sum of Categorical Variable}}{\text{Sum of Posterior Dirichlet Concentration Parameters} + \text{Total Number of Observations}}$

Note, we have that $\psi = \sum_{i=1}^{3}i\alpha_{i}^{0}$ which is simply the dot product of a vector containing all the possible category values and a vector containing all the concentration parameters of the Dirichlet distribution for $P(p_{1}, p_{2}, p_{3}, p_{4}, p_{5}|\vec{\alpha})$ . This has an intuitive interpretation. The concentration parameters determines the location of the simplex on which the true distribution lies. A sparse, or small concentration parameter, implies a distribution with most of its mass concentrated on a few categories. A uniform parameter, such that $\alpha_{1}^{0} = \alpha_{2}^{0} = \alpha_{3}^{0} = \alpha_{4}^{0} = \alpha_{5}^{0}$ implies a perfectly symmetric Dirichlet distribution. Specifically, in the data district lab $\vec{\alpha}$ is assumed to be $[2 \:\:\: 2 \:\:\: 2 \:\:\: 2 \:\:\: 2]'$ This is graphically illustrated below (taken from http://www.bascornelissen.nl/2017/01/01/bags.html):