Categorical distribution

(2) p ( x ) = p 1 [ x = 1 ] ⋯ p k [ x = k ] {displaystyle p(x)=p_{1}^{}cdots p_{k}^{}} (3) p ( x ) = [ x = 1 ] ⋅ p 1 + ⋯ + [ x = k ] ⋅ p k {displaystyle p(x)=cdot p_{1},+cdots +,cdot p_{k}} In probability theory and statistics, a categorical distribution (also called a generalized Bernoulli distribution, multinoulli distribution) is a discrete probability distribution that describes the possible results of a random variable that can take on one of K possible categories, with the probability of each category separately specified. There is no innate underlying ordering of these outcomes, but numerical labels are often attached for convenience in describing the distribution, (e.g. 1 to K). The K-dimensional categorical distribution is the most general distribution over a K-way event; any other discrete distribution over a size-K sample space is a special case. The parameters specifying the probabilities of each possible outcome are constrained only by the fact that each must be in the range 0 to 1, and all must sum to 1. In probability theory and statistics, a categorical distribution (also called a generalized Bernoulli distribution, multinoulli distribution) is a discrete probability distribution that describes the possible results of a random variable that can take on one of K possible categories, with the probability of each category separately specified. There is no innate underlying ordering of these outcomes, but numerical labels are often attached for convenience in describing the distribution, (e.g. 1 to K). The K-dimensional categorical distribution is the most general distribution over a K-way event; any other discrete distribution over a size-K sample space is a special case. The parameters specifying the probabilities of each possible outcome are constrained only by the fact that each must be in the range 0 to 1, and all must sum to 1. The categorical distribution is the generalization of the Bernoulli distribution for a categorical random variable, i.e. for a discrete variable with more than two possible outcomes, such as the roll of a die. On the other hand, the categorical distribution is a special case of the multinomial distribution, in that it gives the probabilities of potential outcomes of a single drawing rather than multiple drawings. Occasionally, the categorical distribution is termed the 'discrete distribution'. However, this properly refers not to one particular family of distributions but to a general class of distributions. In some fields, such as machine learning and natural language processing, the categorical and multinomial distributions are conflated, and it is common to speak of a 'multinomial distribution' when a 'categorical distribution' would be more precise. This imprecise usage stems from the fact that it is sometimes convenient to express the outcome of a categorical distribution as a '1-of-K' vector (a vector with one element containing a 1 and all other elements containing a 0) rather than as an integer in the range 1 to K; in this form, a categorical distribution is equivalent to a multinomial distribution for a single observation (see below). However, conflating the categorical and multinomial distributions can lead to problems. For example, in a Dirichlet-multinomial distribution, which arises commonly in natural language processing models (although not usually with this name) as a result of collapsed Gibbs sampling where Dirichlet distributions are collapsed out of a hierarchical Bayesian model, it is very important to distinguish categorical from multinomial. The joint distribution of the same variables with the same Dirichlet-multinomial distribution has two different forms depending on whether it is characterized as a distribution whose domain is over individual categorical nodes or over multinomial-style counts of nodes in each particular category (similar to the distinction between a set of Bernoulli-distributed nodes and a single binomial-distributed node). Both forms have very similar-looking probability mass functions (PMFs), which both make reference to multinomial-style counts of nodes in a category. However, the multinomial-style PMF has an extra factor, a multinomial coefficient, that is a constant equal to 1 in the categorical-style PMF. Confusing the two can easily lead to incorrect results in settings where this extra factor is not constant with respect to the distributions of interest. The factor is frequently constant in the complete conditionals used in Gibbs sampling and the optimal distributions in variational methods. A categorical distribution is a discrete probability distribution whose sample space is the set of k individually identified items. It is the generalization of the Bernoulli distribution for a categorical random variable. In one formulation of the distribution, the sample space is taken to be a finite sequence of integers. The exact integers used as labels are unimportant; they might be {0, 1, ..., k − 1} or {1, 2, ..., k} or any other arbitrary set of values. In the following descriptions, we use {1, 2, ..., k} for convenience, although this disagrees with the convention for the Bernoulli distribution, which uses {0, 1}. In this case, the probability mass function f is: where p = ( p 1 , … , p k ) {displaystyle {oldsymbol {p}}=(p_{1},ldots ,p_{k})} , p i {displaystyle p_{i}} represents the probability of seeing element i and ∑ i = 1 k p i = 1 {displaystyle extstyle {sum _{i=1}^{k}p_{i}=1}} . Another formulation that appears more complex but facilitates mathematical manipulations is as follows, using the Iverson bracket:

Parent Topic

Child Topic

No Parent Topic