Entropy of Normal Distribution

The entropy of normal distribution is calculated as

He(X)=exp(xμx)2/(2σx2)(2πσx2)1/2ln[exp(xμx)2/(2σx2)(2πσx2)12]dxH_e(X) = \displaystyle \int_{-\infty}^\infty \frac{\exp^{-(x-\mu_x)^2/(2 \sigma_x^2)} }{(2 \pi \sigma_x^2)^{1/2}} \ln \left[ \frac{ \exp^{-(x- \mu_x)^2/(2 \sigma_x^2)} }{(2 \pi \sigma_x^2)^{\frac{1}{2}} } \right] dx

=ln(2πσx2)2[e(xμx)2/(2σx2)(2πσx2)12]dx+12σx2(xμx)2e(xμx)2/(2σx2)(2πσx2)12dx= \frac{ \ln (2 \pi \sigma_x^2) }{2} \int_{-\infty}^\infty \left[ \frac{ e^{-(x-\mu_x)^2/(2 \sigma_x^2)} }{ (2 \pi \sigma_x^2 )^{\frac{1}{2}} } \right] dx + \frac{1}{2 \sigma_x^2} \int_{-\infty}^\infty (x - \mu_x)^2 \frac{ e^{-(x-\mu_x)^2/(2 \sigma_x^2)} }{(2 \pi \sigma_x^2)^{\frac{1}{2}}} dx

&&= \frac{1}{2} \int_{-\infty}^\infty \ln (2 \pi \sigma_x^2) + \frac{1}{2} = \frac{\ln(2 \pi e \sigma_x^2)}{2}

\label{entropy_normal_distribution} \

&&H_2(X)=\log_2 (e) \times H_e(X) = \frac{\log_2 ( 2 \pi e \sigma_x^2)}{2} \nonumber

\end{eqnarray}

\subsection{Entropy of Multivariate normal distribution}

Let $\bar{X}= [X_1, X_2, \cdots, X_n]^t$ have a multivariate normal distribution with mean vector $\bar{\mu}$ and covariance matrix $\bar{\bar{K}}$,

The joint pdf of a multivariate normal distribution is

\begin{eqnarray}

f(\bar{x})=f(x_1,x_2,\cdots, x_n) = \frac{1}{(2 \pi)^{n/2} | \bar{\bar{K}} |^{1/2} } e^{-\frac{1}{2}(\bar{x}-\bar{\mu}) \bar{\bar{K}}^{-1} (\bar{x} - \bar{\mu})}

\nonumber

\end{eqnarray}

The entropy of this multivariate normal distribution is calculated as

\begin{eqnarray}

&&h(\bar{X}) = - \int f(\bar{x}) \log_2 f(\bar{x}) d \bar{x} \nonumber \

&&\hspace{-0.25 in}= \int f(\bar{x}) \left[ \frac{1}{2} \log_2 (2 \pi)^n | \bar{\bar{K}}| + \frac{1}{2} (\bar{x} - \bar{\mu})^t \bar{\bar{K}}^{-1} (\bar{x}-\bar{\mu}) \log_2 e \right] d \bar{x} \nonumber \

&&= \frac{1}{2} \log_2 (2 \pi)^n | \bar{\bar{K}}| \int f(\bar{x}) d \bar{x} \nonumber \

&&+ \frac{\log_2 e}{2} \int f(\bar{x}) \hbox{trace} \left[ (\bar{x}-\bar{\mu})^t \bar{\bar{K}}^{-1}(\bar{x}-\bar{\mu}) \right] d \bar{x} \nonumber \

&&\hspace{-0.25 in} = \frac{1}{2} \log_2 (2 \pi)^n |\bar{\bar{K}}| + \frac{\log_2 e}{2} \int f(\bar{x}) \hbox{trace} \left[ \bar{\bar{K}}^{-1} (\bar{x}-\bar{\mu})^t (\bar{x}-\bar{\mu}) \right] d \bar{x} \nonumber \nonumber \

&&\hspace{-0.25 in} =\frac{1}{2} \log_2 (2 \pi)^n |\bar{\bar{K}}| + \frac{\log_2 e}{2} \hbox{trace} \left[ \bar{\bar{K}}^{-1} \int f(\bar{x}) (\bar{x}-\bar{\mu})^t (\bar{x}-\bar{\mu}) d \bar{x} \right ] \nonumber \

&&= \frac{1}{2} \log_2 (2 \pi)^n |\bar{\bar{K}}| + \frac{\log_2 e}{2} \hbox{trace} \left[ \bar{\bar{K}}^{-1} \bar{\bar{K}} \right] \nonumber \

&&= \frac{1}{2} \log_2 (2 \pi)^n |\bar{\bar{K}}| + \frac{\log_2 e}{2} n = \frac{1}{2} \log_2 \left[ (2 \pi e)^n | \bar{\bar{K}} | \right]

\label{entropy_multivariate_normal_distribution}

\end{eqnarray}

Considering $X_1$ and $X_2$ are two independently normal distribution with zero mean and variance $\sigma_n^2/2$, the covariance matrix is

\begin{eqnarray}

\bar{\bar{K}} = \left[ \matrix{\hbox{E}(X_1 X_1) & \hbox{E}(X_1 X_2) \cr \hbox{E}(X_2 X_1) & \hbox{E}(X_2 X_2)} \right]=

\left[ \matrix{\sigma_n^2/2 & 0 \cr 0 & \sigma_n^2/2} \right]\nonumber

\end{eqnarray}

and (\ref{entropy_multivariate_normal_distribution}) becomes

\begin{eqnarray}

h(\bar{X}) = \frac{1}{2} \log_2 \left[ (2 \pi e)^2 |\bar{\bar{K}}| \right] =

\log_2 \left[ (2 \pi e) \sigma_n^2/2 \right] = \log_2 \left[ \pi e \sigma_n^2\right]

\label{entropy_two_independent_normal_distribution}

\end{eqnarray}

\subsection{Entropy of Complex Gaussian Random Variable \cite{Telatar_Emre_1999}}

The probability density of a circularly symmetric complex Gaussian $\bar{x}$ with mean $\bar{\mu}$ and covariance $\bar{\bar{Q}}$ is given as

\begin{eqnarray}

&&f_{\bar{\mu}, \bar{\bar{Q}}}(\bar{x}) = \hbox{det}(\pi \hat{Q})^{-1/2} \exp( -(\hat{Q}-\hat{\mu})^\dagger \hat{Q}^{-1} (\hat{x} - \hat{\mu})) \nonumber \

&&=\hbox{det}(\pi \bar{\bar{Q}})^{-1} \exp( -(\bar{x}-\bar{\mu})^\dagger \bar{\bar{Q}}^{-1} (\bar{x}-\bar{\mu}))

\label{probability_density_complex_gaussian_variable_TE_1999}

\end{eqnarray}

where for any $\bar{z} \in C^n$ and $\bar{\bar{A}} \in C^{n \times m}$, $\hat{z}$ and $\hat{A}$ is defined as

\begin{eqnarray}

\hat{z} = \left[ \matrix{\hbox{Re}(\bar{z}) \cr \hbox{Im}(\bar{z})} \right], \hspace{0.1 in}

\hat{A} = \left[ \matrix{ \hbox{Re}(\bar{\bar{A}}) & - \hbox{Im}(\bar{\bar{A}}) \cr \hbox{Im}(\bar{\bar{A}}) & \hbox{Re}(\bar{\bar{A}}) } \right]

\nonumber

\end{eqnarray}

and (\ref{probability_density_complex_gaussian_variable_TE_1999}) follows from (\ref{4d})-(\ref{4h})

\begin{eqnarray}

&&\bar{\bar{C}} = \bar{\bar{A}}^{-1} \iff \hat{C} = \hat{A}^{-1}

\label{4d} \

&&\hbox{det}(\hat{A}) = | \hbox{det}(\bar{\bar{A}})|^2 = \hbox{det}(\bar{\bar{A}} \bar{\bar{A}}^\dagger)

\label{4e} \

&&\bar{z} = \bar{x} + \bar{y} \iff \hat{z} = \hat{x} + \hat{y}

\label{4f} \

&&\bar{y} = \bar{\bar{A}} \bar{x} \iff \hat{y} = \hat{A} \hat{x}

\label{4g} \

&&\hbox{Re}( \bar{x}^\dagger \bar{y} ) = \hat{x}^\dagger \hat{y}

\label{4h}

\end{eqnarray}

The differential entropy of a complex Gaussian $\bar{x}$ with covariance $\bar{\bar{Q}}$ is given by

\begin{eqnarray}

&&h( \bar{X} ) = \hbox{E}_{ f_{\bar{\mu}, \bar{\bar{Q}} }} [- \log f_{\bar{\mu}, \bar{\bar{Q}} }] \nonumber \

&&= \log \hbox{det} (\pi \bar{\bar{Q}}) + (\log e) \hbox{E} [\bar{x}^\dagger \bar{\bar{Q}}^{-1} \bar{x}] \nonumber \

&&= \log \hbox{det} (\pi \bar{\bar{Q}}) + (\log e) \hbox{tr} (\hbox{E}[\bar{x} \bar{x}^\dagger)]

\bar{\bar{Q}}^{-1})

\nonumber \

&&= \log \hbox{det} ( \pi \bar{\bar{Q}}) + (\log e) \hbox{tr}(\bar{\bar{I}}) \nonumber \

&&= \log \hbox{det} (\pi e \bar{\bar{Q}})

\label{Entropy_complex_Gaussian_Random_variable}

\end{eqnarray}

In SISO case, $\bar{\bar{Q}} = Q = \sigma_n^2$, and (\ref{Entropy_complex_Gaussian_Random_variable}) becomes

\begin{eqnarray}

h(X)=\log (\pi e \sigma_n^2)

\label{entropy_of_One_complex_Gaussian_variable}

\end{eqnarray}

\subsection{Mutual Information}

The mutual information (MI) of two random variables is a measure of the mutual dependence between the two variables.

MI quantifies the ``amount of information" (in units of Shannon, or bits) obtained about one random variable, through the other random variable.

The concept of mutual information is intricately linked to that of entropy of a random variable, that defines the ``amount of information" held in a random variable.

Not limited to real-valued random variables like the correlation coefficient, MI is more general and determines how similar the joint distribution $p(X,Y)$ is to the products of factored marginal distribution $p(X) p(Y)$.

MI is the expected value of the pointwise mutual information (PMI).

The mutual information between two discrete random variables $X$, $Y$, jointly distributed according to $p(x,y)$ is given by

\begin{eqnarray}

&&I(X;Y) = \sum_{x_i,y_j} p(x_i,y_j) \log \frac{p(x_i,y_j)}{p(x_i) p(y_j)}

\nonumber \

&& = H(X) - H(X|Y) \nonumber \

&&= H(Y) - H(Y|X) \nonumber \

&&=H(X) + H(Y)- H(X,Y) \nonumber

\end{eqnarray}

The mutual information between two continuous random variables $X$, $Y$, with joint pdf $f(x,y)$ is

\begin{eqnarray}

I(X;Y) = \int \int f(x,y) \log \frac{f(x,y)}{f(x) f(y)} dx dy

\nonumber

\end{eqnarray}

For two variables, it is possible to represent the different entropic quantities with an analogy to set theory.

\begin{figure}[h]

\vskip 4.5 cm

\hskip 0 cm

\special{wmf:graphical_representation_conditional_entropy_MI.jpg x=8 cm y=4.5 cm}

\caption{Graphical representation of the conditional entropy and the mutual information.}

\label{graphical_representation_conditional_entropy_MI}

\end{figure}

Fig.\ref{graphical_representation_conditional_entropy_MI} shows the different quantities, and how the mutual information is the uncertainty that is common to both $X$ and $Y$.

Inuitively, mutual information measures the information that $X$ and $Y$ share.

Mutual information measures how much knowing one of these variables reduces uncertainty about the other.

For example, if $X$ and $Y$ are independent, then knowing $X$ does not give any information about $Y$ and vice versa, so their mutual information is zero.

At the other extreme, if $X$ is a deterministic function of $Y$, and $Y$ is a deterministic function of $X$, then all information conveyed by $X$ is shared with $Y$: knowing $X$ determines the value of $Y$ and vice versa.

As a result, the mutual information is the same as the uncertainty contained in $Y$ or $X$ alone, i.e., the entropy of $Y$ or $X$.

Moreover, this mutual information is the same as the entropy of $X$ and as the entropy of $Y$.

Mutual information is a measure of the inherent dependence expressed in the joint distribution of $X$ and $Y$ relative to the joint distribution of $X$ and $Y$ under the assumption of independence.

Mutual information measures dependence in the following sense:

$I(X;Y)=0$ if and only if $X$ and $Y$ are independent random variables.

This can be elaborated in one direction: if $X$ and $Y$ are independent, then $p(x,y)=p(x)p(y)$, thus

\begin{eqnarray}

\log \left( \frac{p(x,y)}{p(x) p(y)} \right) = \log 1 = 0

\nonumber

\end{eqnarray}

\subsection{Conditional Mutual Information}

Let $X$, $Y$, $Z$ be jointly distributed according to some p.m.f $p(x_i,y_j,z_k)$, the conditional mutual information between $X$, $Y$ given $Z$ is

\begin{eqnarray}

&&I(X;Y|Z) =\hbox{E}_Z \left( I (X;Y) |Z \right) = \nonumber \

&&= \sum_{z_k} p_Z(z) \sum_{y_j} \sum_{x_i} p_{X,Y|z} \log \frac{ p_{X,Y|Z} (x,y|z)}{ p_{X|Z}(x|z) p_{Y|Z}(y|z)} \nonumber \

&&=\sum_{z_k} p_Z(z) \sum_{y_j} \sum_{x_i} \frac{p(x_i,y_j,z_k)}{p(z)} \log \frac{ \frac{ p(x,y,z)}{p(z) } }{ \frac{p(x,z)}{p(z)} \frac{(y,z)}{p(z)} } \nonumber \

&&= \sum_{z_k} \sum_{y_j} \sum_{x_i} p_{X,Y,Z}(x,y,z) \log \frac{p_Z(z) p_{X,Y,Z}(x,y,z) }{p_{X,Z}(x,z) p_{Y,Z}(y,z)}

\nonumber \

&&=H(X|Z) - H(X|Y,Z) \nonumber \

&&= H(X,Z) + H(Y,Z) - H(X,Y,Z) - H(Z) \nonumber \

&&= H(X|Z) + H(Y|Z)- H(X,Y|Z) \nonumber \

&&= I(X;Y,Z) - I(X;Y|Z) \nonumber

\label{conditional_MI}

\end{eqnarray}

The conditional mutual information is a measure of how much uncertainty is shared by $X$ and $Y$, but not by $Z$.

\subsection{Mutual information chain rule}

The mutual information chain rule is derived from (\ref{conditional_MI}) as

\begin{eqnarray}

I(X;Y,Z) = I(X;Z) + I(X;Y|Z)

\label{MI_chain_rule}

\end{eqnarray}

results matching ""

    No results matching ""