# Many faces of entropy

Given a finite measure $\pi$ absolutely continuous with respect to a $\sigma$-finite measure $\mu$, the entropy of $\pi$ under measure $\mu$ is $$H_\mu(\pi) = -\int_S \pi_\mu \log \pi_\mu \,d\mu,$$ where $\pi_\mu$ is the Radon-Nikodym derivative of $\pi$ with respect to $\mu$.

With this definition, the KL divergence $D(\pi || \mu)$ from $\mu$ to $\pi$ equals minus entropy of $\pi$ under $\mu$,

The differential entropy $h(\pi)$ is the entropy of $\pi$ under the Lebesgue measure $\lambda$,

The joint entropy $h(X, Y)$ equals the entropy of the joint distribution $\mu_{XY}$ over $X$ and $Y$ under the Lebesgue measure,

The mutual information $I(X; Y)$ equals minus entropy of the joint distribution $\mu_{XY}$ under the product measure $\mu_X \times \mu_Y$,

The conditional entropy $h(Y|X)$ equals the entropy of $\mu_{XY}$ under the product measure $\mu_X \times \lambda$,

Such unification was actually the motivation of Kullback and Leibler’s seminal paper on information and sufficiency.