Bayesian Averages Under a Dirichlet Distribution
Below we derive the modified Bayesian average of a five star rating system, under the assumption that each rating comes from a dirichlet distribution.
For companion guides, and Python code which implements some of these ideas, please see:
Let X be a categorical random variable such that X∈{1,2,3,4,5}. With Xi=i ∀ i∈{1,2,3,4,5}. Further, let O represent a sequence of independent observations such that O=(o1,...,oN)
Further, let Ki represent the number of observations for each of the categories enumerate above and let pi represent the probability of observing each categorical random variable value. Then our Bayesian relationship is given by:
P(p1,p2,p3,p4,p5∣O)∝P(O∣p1,p2,p3,p4,p5)P(p1,p2,p3,p4,p5)
Assume that the prior probability is characterized by the following distribution:
P(O∣p1,p2,p3,p4,p5)=p1K1p2K2p3K3p4K4p5K5
Assume that the posterior probability also follows a Dirichlet distribution (i.e. the probabilities of each categorical variable are drawn from a Dirichlet distribution) such that:
P(p1,p2,p3,p4,p5∣O)=P(O∣p1,p2,p3,p4,p5)P(p1,p2,p3,p4,p5∣α⃗)
∝p1K1+α10−1p2K2+α10−1p3K3+α10−1p4K4+α10−1p5K5+α10−1
This distribution is parametrized by the following vector:
γ⃗=[K1+α10K2+α20K3+α30K4+α40K5+α50]′
Recall that the pdf and first moment for the Dirichlet distribution are given by:
B(γ⃗)1∏i=13piαi−1
E[pi]=∑j=13αjαi
Under the Dirichlet distribution, the expected probability of each categorical value is given by the first moment above. Therefore, we can write the expected probability of each categorical value, conditioned on the total set of observations, as:
(1) E[p1+2∗p2+3∗p3+4∗p4+5∗p5∣O]=∑i=15i∗E[pi∣O]
(2) E[pi∣O]=∑j=15γjγi
Plugging (2) into (1) we have:
(3) E[p1+2∗p2+3∗p3+4∗p4+5∗p5∣O]=N+∑j=15αj0ψ+∑i=15iKi
Where, ψ=∑i=15iαi0 and N=∑i=15Kj
More intuitively, we can express (3) as:
Sum of Posterior Dirichlet Concentration Parameters+Total Number of Observationsψ+Sum of Categorical Variable
Note, we have that ψ=∑i=13iαi0 which is simply the dot product of a vector containing all the possible category values and a vector containing all the concentration parameters of the Dirichlet distribution for P(p1,p2,p3,p4,p5∣α⃗). This has an intuitive interpretation. The concentration parameters determines the location of the simplex on which the true distribution lies. A sparse, or small concentration parameter, implies a distribution with most of its mass concentrated on a few categories. A uniform parameter, such that α10=α20=α30=α40=α50 implies a perfectly symmetric Dirichlet distribution. Specifically, in the data district lab α⃗ is assumed to be [22222]′ This is graphically illustrated below (taken from http://www.bascornelissen.nl/2017/01/01/bags.html):