Bayesian Averages Under a Dirichlet Distribution

Below we derive the modified Bayesian average of a five star rating system, under the assumption that each rating comes from a dirichlet distribution.

For companion guides, and Python code which implements some of these ideas, please see:

Let XX be a categorical random variable such that X{1,2,3,4,5}X \in \{ 1, 2, 3, 4, 5 \}. With Xi=iX_{i} = i \forall i{1,2,3,4,5}i \in \{ 1, 2, 3, 4, 5 \}. Further, let OO represent a sequence of independent observations such that O=(o1,...,oN)O = (o_{1}, ..., o_{N})

Further, let KiK_{i} represent the number of observations for each of the categories enumerate above and let pip_{i} represent the probability of observing each categorical random variable value. Then our Bayesian relationship is given by:

    P(p1,p2,p3,p4,p5O)P(Op1,p2,p3,p4,p5)P(p1,p2,p3,p4,p5)P(p_{1}, p_{2}, p_{3}, p_{4}, p_{5}|O) \propto P(O|p_{1}, p_{2}, p_{3}, p_{4}, p_{5})P(p_{1}, p_{2}, p_{3}, p_{4}, p_{5})

Assume that the prior probability is characterized by the following distribution:

    P(Op1,p2,p3,p4,p5)=p1K1p2K2p3K3p4K4p5K5P(O|p_{1}, p_{2}, p_{3}, p_{4}, p_{5}) = p_{1}^{K_{1}}p_{2}^{K_{2}}p_{3}^{K_{3}}p_{4}^{K_{4}}p_{5}^{K_{5}}

Assume that the posterior probability also follows a Dirichlet distribution (i.e. the probabilities of each categorical variable are drawn from a Dirichlet distribution) such that:

    P(p1,p2,p3,p4,p5O)=P(Op1,p2,p3,p4,p5)P(p1,p2,p3,p4,p5α)P(p_{1}, p_{2}, p_{3}, p_{4}, p_{5}|O) = P(O|p_{1}, p_{2}, p_{3}, p_{4}, p_{5})P(p_{1}, p_{2}, p_{3}, p_{4}, p_{5}| \vec{\alpha})

    p1K1+α101p2K2+α101p3K3+α101p4K4+α101p5K5+α101\propto p_{1}^{K_{1}+\alpha_{1}^{0} - 1}p_{2}^{K_{2} +\alpha_{1}^{0} - 1}p_{3}^{K_{3} +\alpha_{1}^{0} - 1}p_{4}^{K_{4} +\alpha_{1}^{0} - 1}p_{5}^{K_{5} +\alpha_{1}^{0} - 1}

This distribution is parametrized by the following vector:

    γ=[K1+α10K2+α20K3+α30K4+α40K5+α50]\vec{\gamma} = [K_{1}+\alpha_{1}^{0} \:\:\: K_{2} +\alpha_{2}^{0} \:\:\: K_{3} +\alpha_{3}^{0} \:\:\: K_{4} +\alpha_{4}^{0} \:\:\: K_{5} +\alpha_{5}^{0}]'

Recall that the pdf and first moment for the Dirichlet distribution are given by:

    1B(γ)i=13piαi1\frac{1}{B(\vec{\gamma})} \prod_{i=1}^{3} p_{i}^{\alpha_{i} - 1}

    E[pi]=αij=13αjE[p_{i}] = \frac{\alpha_{i}}{\sum_{j=1}^{3} \alpha_{j}}

Under the Dirichlet distribution, the expected probability of each categorical value is given by the first moment above. Therefore, we can write the expected probability of each categorical value, conditioned on the total set of observations, as:

    (1)     E[p1+2p2+3p3+4p4+5p5O]=i=15iE[piO]E[p_{1} + 2*p_{2} + 3*p_{3} + 4 * p_{4} + 5 * p_{5}|O] = \sum_{i=1}^{5} i * E[p_{i} | O]

    (2)     E[piO]=γij=15γjE[p_{i}|O] = \frac{\gamma_{i}}{\sum_{j=1}^{5}\gamma_{j}}

Plugging (2) into (1) we have:

    (3)     E[p1+2p2+3p3+4p4+5p5O]=ψ+i=15iKiN+j=15αj0E[p_{1} + 2*p_{2} + 3*p_{3} + 4 * p_{4} + 5 * p_{5}|O]= \frac{\psi + \sum_{i=1}^{5} iK_{i}}{N + \sum_{j=1}^{5}\alpha_{j}^{0}}

Where, ψ=i=15iαi0\psi = \sum_{i=1}^{5}i\alpha_{i}^{0} and N=i=15KjN = \sum_{i=1}^{5}K_{j}

More intuitively, we can express (3) as:

       ψ+Sum of Categorical VariableSum of Posterior Dirichlet Concentration Parameters+Total Number of Observations\frac{\psi + \text{Sum of Categorical Variable}}{\text{Sum of Posterior Dirichlet Concentration Parameters} + \text{Total Number of Observations}}

Note, we have that ψ=i=13iαi0\psi = \sum_{i=1}^{3}i\alpha_{i}^{0} which is simply the dot product of a vector containing all the possible category values and a vector containing all the concentration parameters of the Dirichlet distribution for P(p1,p2,p3,p4,p5α)P(p_{1}, p_{2}, p_{3}, p_{4}, p_{5}|\vec{\alpha}). This has an intuitive interpretation. The concentration parameters determines the location of the simplex on which the true distribution lies. A sparse, or small concentration parameter, implies a distribution with most of its mass concentrated on a few categories. A uniform parameter, such that α10=α20=α30=α40=α50\alpha_{1}^{0} = \alpha_{2}^{0} = \alpha_{3}^{0} = \alpha_{4}^{0} = \alpha_{5}^{0} implies a perfectly symmetric Dirichlet distribution. Specifically, in the data district lab α\vec{\alpha} is assumed to be [22222][2 \:\:\: 2 \:\:\: 2 \:\:\: 2 \:\:\: 2]' This is graphically illustrated below (taken from http://www.bascornelissen.nl/2017/01/01/bags.html):