language-icon Old Web
English
Sign In

Mutual information

In probability theory and information theory, the mutual information (MI) of two random variables is a measure of the mutual dependence between the two variables. More specifically, it quantifies the 'amount of information' (in units such as shannons, commonly called bits) obtained about one random variable through observing the other random variable. The concept of mutual information is intricately linked to that of entropy of a random variable, a fundamental notion in information theory that quantifies the expected 'amount of information' held in a random variable. I ( X ; Y ) = D K L ( P ( X , Y ) ‖ P X ⊗ P Y ) {displaystyle I(X;Y)=D_{mathrm {KL} }(P_{(X,Y)}|P_{X}otimes P_{Y})} I ⁡ ( X ; Y ) = ∑ y ∈ Y ∑ x ∈ X p ( X , Y ) ( x , y ) log ⁡ ( p ( X , Y ) ( x , y ) p X ( x ) p Y ( y ) ) , {displaystyle operatorname {I} (X;Y)=sum _{yin {mathcal {Y}}}sum _{xin {mathcal {X}}}{p_{(X,Y)}(x,y)log {left({frac {p_{(X,Y)}(x,y)}{p_{X}(x),p_{Y}(y)}} ight)}},}     (Eq.1) I ⁡ ( X ; Y ) = ∫ Y ∫ X p ( X , Y ) ( x , y ) log ⁡ ( p ( X , Y ) ( x , y ) p X ( x ) p Y ( y ) ) d x d y , {displaystyle operatorname {I} (X;Y)=int _{mathcal {Y}}int _{mathcal {X}}{p_{(X,Y)}(x,y)log {left({frac {p_{(X,Y)}(x,y)}{p_{X}(x),p_{Y}(y)}} ight)}};dx,dy,}     (Eq.2) I ⁡ ( X ; Y ) = D KL ( p ( X , Y ) ∥ p X p Y ) {displaystyle operatorname {I} (X;Y)=D_{ ext{KL}}left(p_{(X,Y)}parallel p_{X}p_{Y} ight)} I ⁡ ( X ; Y ) = E Y [ D KL ( p X | Y ∥ p X ) ] {displaystyle operatorname {I} (X;Y)=mathbb {E} _{Y}left} I ⁡ ( X ; Y | Z ) = E Z [ D K L ( P ( X , Y ) | Z ‖ P X | Z ⊗ P Y | Z ) ] {displaystyle operatorname {I} (X;Y|Z)=mathbb {E} _{Z}} In probability theory and information theory, the mutual information (MI) of two random variables is a measure of the mutual dependence between the two variables. More specifically, it quantifies the 'amount of information' (in units such as shannons, commonly called bits) obtained about one random variable through observing the other random variable. The concept of mutual information is intricately linked to that of entropy of a random variable, a fundamental notion in information theory that quantifies the expected 'amount of information' held in a random variable. Not limited to real-valued random variables and linear dependence like the correlation coefficient, MI is more general and determines how similar the joint distribution of the pair ( X , Y ) {displaystyle (X,Y)} is to the product of the marginal distributions of X {displaystyle X} and Y {displaystyle Y} . MI is the expected value of the pointwise mutual information (PMI). Mutual Information is also known as information gain. Let ( X , Y ) {displaystyle (X,Y)} be a pair of random variables with values over the space X × Y {displaystyle {mathcal {X}} imes {mathcal {Y}}} . If their joint distribution is P ( X , Y ) {displaystyle P_{(X,Y)}} and the marginal distributions are P X {displaystyle P_{X}} and P Y {displaystyle P_{Y}} , the mutual information is defined as Notice, as per property of the Kullback–Leibler divergence, that I ( X ; Y ) {displaystyle I(X;Y)} is equal to zero precisely when the joint distribution coincides with the product of the marginals, i.e. when X {displaystyle X} and Y {displaystyle Y} are independent. In general I ( X ; Y ) {displaystyle I(X;Y)} is non-negative, it is a measure of the price for encoding ( X , Y ) {displaystyle (X,Y)} as a pair of independent random variables, when in reality they are not. The mutual information of two jointly discrete random variables X {displaystyle X} and Y {displaystyle Y} is calculated as a double sum::20 where p ( X , Y ) {displaystyle p_{(X,Y)}} is the joint probability mass function of X {displaystyle X} and Y {displaystyle Y} , and p X {displaystyle p_{X}} and p Y {displaystyle p_{Y}} are the marginal probability mass functions of X {displaystyle X} and Y {displaystyle Y} respectively. In the case of jointly continuous random variables, the double sum is replaced by a double integral::251 where p ( X , Y ) {displaystyle p_{(X,Y)}} is now the joint probability density function of X {displaystyle X} and Y {displaystyle Y} , and p X {displaystyle p_{X}} and p Y {displaystyle p_{Y}} are the marginal probability density functions of X {displaystyle X} and Y {displaystyle Y} respectively.

[ "Algorithm", "Machine learning", "Artificial intelligence", "Pattern recognition", "Statistics", "mutual information feature selection", "Total correlation", "Dual total correlation", "fuzzy mutual information", "Chow–Liu tree" ]
Parent Topic
Child Topic
    No Parent Topic