Entropy (information theory)

Information entropy is the average rate at which information is produced by a stochastic source of data. The measure of information entropy associated with each possible data value is the negative logarithm of the probability mass function for the value: When the data source produces a low-probability value (i.e., when a low-probability event occurs), the event carries more 'information' ('surprisal') than when the source data produces a high-probability value. The amount of information conveyed by each event defined in this way becomes a random variable whose expected value is the information entropy. Generally, entropy refers to disorder or uncertainty, and the definition of entropy used in information theory is directly analogous to the definition used in statistical thermodynamics. The concept of information entropy was introduced by Claude Shannon in his 1948 paper 'A Mathematical Theory of Communication'. The basic model of a data communication system is composed of three elements: a source of data, a communication channel, and a receiver, and – as expressed by Shannon – the 'fundamental problem of communication' is for the receiver to be able to identify what data was generated by the source, based on the signal it receives through the channel.:379–423 and 623–656 The entropy provides an absolute limit on the shortest possible average length of a lossless compression encoding of the data produced by a source, and if the entropy of the source is less than the channel capacity of the communication channel, the data generated by the source can be reliably communicated to the receiver (at least in theory, possibly neglecting some practical considerations such as the complexity of the system needed to convey the data and the amount of time it may take for the data to be conveyed). Information entropy is typically measured in bits (alternatively called 'shannons') or sometimes in 'natural units' (nats) or decimal digits (called 'dits', 'bans', or 'hartleys'). The unit of the measurement depends on the base of the logarithm that is used to define the entropy. The logarithm of the probability distribution is useful as a measure of entropy because it is additive for independent sources. For instance, the entropy of a fair coin toss is 1 bit, and the entropy of m tosses is m bits. In a straightforward representation, log2(n) bits are needed to represent a variable that can take one of n values if n is a power of 2. If these values are equally probable, the entropy (in bits) is equal to n. If one of the values is more probable to occur than the others, an observation that this value occurs is less informative than if some less common outcome had occurred. Conversely, rarer events provide more information when observed. Since observation of less probable events occurs more rarely, the net effect is that the entropy (thought of as average information) received from non-uniformly distributed data is always less than or equal to log2(n). Entropy is zero when one outcome is certain to occur. The entropy quantifies these considerations when a probability distribution of the source data is known. The meaning of the events observed (the meaning of messages) does not matter in the definition of entropy. Entropy only takes into account the probability of observing a specific event, so the information it encapsulates is information about the underlying probability distribution, not the meaning of the events themselves. The basic idea of information theory is that the 'news value' of a communicated message depends on the degree to which the content of the message is surprising. If an event is very probable, it is no surprise (and generally uninteresting) when that event happens as expected. However, if an event is unlikely to occur, it is much more informative to learn that the event happened or will happen. For instance, the knowledge that some particular number will not be the winning number of a lottery provides very little information, because any particular chosen number will almost certainly not win. However, knowledge that a particular number will win a lottery has high value because it communicates the outcome of a very low probability event. The information content (also called the surprisal) of an event E {displaystyle E} is an increasing function of the reciprocal of the probability p ( E ) {displaystyle p(E)} of the event, precisely I ( E ) = − log 2 ⁡ ( p ( E ) ) = log 2 ⁡ ( 1 / p ( E ) ) {displaystyle I(E)=-log _{2}(p(E))=log _{2}(1/p(E))} . Entropy measures the expected (i.e., average) amount of information conveyed by identifying the outcome of a random trial. This implies that casting a die has higher entropy than tossing a coin because each outcome of a die toss has smaller probability (about p = 1 / 6 {displaystyle p=1/6} ) than each outcome of a coin toss ( p = 1 / 2 {displaystyle p=1/2} ). Entropy is a measure of the unpredictability of the state, or equivalently, of its average information content. To get an intuitive understanding of these terms, consider the example of a political poll. Usually, such polls happen because the outcome of the poll is not already known. In other words, the outcome of the poll is relatively unpredictable, and actually performing the poll and learning the results gives some new information; these are just different ways of saying that the a priori entropy of the poll results is large. Now, consider the case that the same poll is performed a second time shortly after the first poll. Since the result of the first poll is already known, the outcome of the second poll can be predicted well and the results should not contain much new information; in this case the a priori entropy of the second poll result is small relative to that of the first.

Parent Topic

Child Topic

No Parent Topic