Self Information
Quantifying information
The idea of quantifying information was first introduced by Claude E. Shanon [1] in his historical paper “A Mathematical Theory of Communication”. The basic idea was quite simple and formed with first principle thinking. Shanon came up with the following formula to quantify information:
Information is a non-negative quantity i.e.
where
If an event has a less likely chance of happening, it should contain less information.
In contrast, if an event has a more likely chance of happening it should contain more information.
Independent events should contain additive information.
The following is a plot of
(Click and drag across the plot to see how the value changes)
The x-axis of the plot represents
One important detail to point out is the base of the logarithm. Normally in machine learning natural log is used, but in digital communication theory, a base-2 log is used. But changing the base only scales the relation by a constant factor, everything else remains the same. The unit for information with base-
Now to make things concrete consider the two extremes. Suppose, you know an event will occur with 100% certainty which means the probability of that event
Another thing is the additivity property in case of independent event can be represented as follows (for two events) :
For
References
[2] Ian Goodfellow, Yoshua Bengio, and Aaron Courville 2016. Deep Learning. MIT Press.