6. Differential Entropy

Entropy

(continuous) $X$ with cumulative distribution function $F(x)=Pr(X\leq x)$
support set of $X$ : $f(x)>0$
differential entropy $h(x)$ $h (x)$ : $h(X)=-\int_Sf(x)\log f(x)dx$ $h (X) = - \int_{S} f (x) lo g f (x) d x$
- $h(X+c) = h(X)$
- $h(aX)=h(X)+\log|a|$
- $h(AX)=h(X)+\log|\det A|$
- $h(X)$ may be negative ( $f(x)$ may $>1$ )
uniform: $h(X)=\log a$
Gaussian: $h(X)=\frac{1}{2}\log 2\pi e\sigma^2$
$h(X)$ $h (X)$ : Infinite Information
- does not serve as a measure of the average amount of information
$h(X_1,X_2,\cdots,X_n)=-\int f(x^n)\log f(x^n)dx^n$
$h(X|Y)=-\int f(x,y)\log f(x|y)dxdy)$
Relative Entropy: $D(f\|g)=\int f\log\frac{f}{g}\geq 0$
mutual information: $I(X;Y)=\int f(x,y)\log\frac{f(x,y)}{f(x)f(y)}dxdy\geq 0$

$-\frac{1}{n}\log f(X_1,X_2,\cdots,X_n)\rightarrow E(-\log f(X))=h(f)$
$A_\epsilon^{(n)}=\{(x_1,x_2,\cdots,x_n)\in S^n:|-\frac{1}{n}\log f(x_1,\cdots, x_n)-h(X)|\leq\epsilon\}$
$\text{Vol}(A)=\int_Adx_1dx_2\cdots dx_n$
Properties
- $Pr(A_\epsilon^{(n)})>1-\epsilon$ for $n$ sufficiently large
- $\text{Vol}(A_\epsilon^{(n)})\leq 2^{n(h(X)+\epsilon)}$
- $\text{Vol}(A_\epsilon^{(n)})\geq (1-\epsilon)2^{n(h(X)-\epsilon)}$

cov( $X$ , $Y$ )= $E(X-EX)(Y-EY)=E(XY)-(EX)(EY)$
$\vec X$ : $K_X=E(X-EX)(X-EX)^T=[\text{cov}(X_i;X_j)]$
correlation matrix: $\widetilde K_X=EXX^T=[EX_iX_j]$ $K_{X} = E X X^{T} = [E X_{i} X_{j}]$
- symmetric and positive semidifinite
$K_X=\widetilde K_X-(EX)(EX^T)$
$Y=AX$ $Y = A X$
- $K_Y=AK_XA^T$
- $\widetilde K_Y=A\widetilde K_XA^T$

$f(x)=\frac{1}{(2\pi)^{\frac{n}{2}}}\exp(-\frac{1}{2}(x-\mu)^TK^{-1}(x-\mu))$

uncorrelated then independent
$h(X_1,X_2,\cdots,X_n)=h(\mathcal{N}(\mu, K))=\frac{1}{2}\log(2\pi e)^n|K|$
the mutual information between $X$ and $Y$ is $I(X;Y)=\sup_{P,Q}I([X]_P;[Y]_Q)$ over all finite partitions $P$ and $Q$
Correlatetd Gaussian $(X,Y)\sim\mathcal{N}(0,K)$ $K=\begin{bmatrix}\sigma^2 & \rho\sigma^2\\\rho \sigma^2 & \sigma^2\end{bmatrix}$

$X\in R$ have mean $\mu$ and variance $\sigma^2$ , then $h(X)\leq\frac{1}{2}\log 2\pi e\sigma^2$ with equality iff $X\sim\mathcal{N}(\mu, \sigma^2)$
$X\in R$ that $EX^2\leq \sigma^2$ , then $h(X)\leq\frac{1}{2}\log 2\pi e\sigma^2$
Problem: find density $f$ $f$ over $S$ $S$ meeting moment constraints $\alpha_1,\cdots,\alpha_m$ $α_{1}, \dots, α_{m}$
- $f(x)\geq 0$
- $\int_S f(x)dx=1$
- $\int_S f(x)r_i(x)dx=\alpha_i$
Maximum entropy distribution: $f^*(x)=f_\lambda(x)=e^{\lambda_0+\sum_{i=1}^m\lambda_ir_i(x)}$ $f^{*} (x) = f_{λ} (x) = e^{λ_{0} + \sum_{i = 1}^{m} λ_{i} r_{i} (x)}$
- $S=[a,b]$ with no other constraints: uniform distributioni over this range
- $S=[0,\infty), EX=\mu$ , then $f(x)=\frac{1}{\mu}e^{-\frac{x}{\mu}}$
- $S=(-\infty, \infty), EX=\alpha_1,EX^2=\alpha_2$ , then $\mathcal{N}(\alpha_1,\alpha_2-\alpha_1^2)$

$h(X,Y)\leq h(X)+h(Y)$
neither $h(X,Y)\geq h(X)$ nor $h(X,Y)\leq h(X)$
$[n]=\{1,2,\cdots,n\}$ , for $\alpha\subset[n]$ , $X_\alpha=(X_i:i\in\alpha)$
linear continous inequality $\sum_\alpha w_\alpha h(X_\alpha)\geq 0$ is valid iff its corresponding discrete counterpart $\sum_\alpha w_\alpha H(X_\alpha)\geq 0$ is valid and balanced

$h_k^{(n)}=\frac{1}{\binom{n}{k}}\sum_{S:|S|=k}\frac{h(X(S))}{k}$
$g_k^{(n)}=\frac{1}{\binom{n}{k}}\sum_{S:|S|=k}\frac{h(X(S))|X)(S^c)}{k}$
Han's Inequality: $h_1^{(n)}\geq h_2^{(n)}\geq\cdots\geq h_n^{(n)}=H(X_1,\cdots,X_n)/n=g_n^{(n)}\geq\cdots\geq g_2^{(n)}\geq g_1^{(n)}$

Heat equation (Fourier, 热传导方程): $x$ is position and $t$ is time, $\frac{\partial}{\partial t}f(x, t)=\frac{1}{2}\frac{\partial^2}{\partial x^2}f(x,t)$
$Y_t=X+\sqrt{t}Z,Z\sim\mathcal{N}(0,1)$ , then $f(y;t)=\int f(x)\frac{1}{\sqrt{2\pi t}}e^{-\frac{(y-x)^2}{2t}}dx$
Gaussian channel -- Heat Equaition
Fisher Information: $I(X)=\int_{-\infty}^{+\infty}f(x)[\frac{\frac{\partial}{\partial x}f(x)}{f(x)}]^2dx$
De Bruijn's Identity: $\frac{\partial}{\partial t}h(Y_t)=\frac{1}{2}I(Y_t)$

EPI (Entropy power inequality) $e^{\frac{2}{n}h(X+Y)}\geq e^{\frac{2}{n}h(X)}+e^{\frac{2}{n}h(Y)}$ $e^{\frac{2}{n} h (X + Y)} \geq e^{\frac{2}{n} h (X)} + e^{\frac{2}{n} h (Y)}$ "最为强悍的工具"
- Uncertainty principle
- Young's inequality
- Nash's inequality
- Cramer-Rao bound: $V(\hat \theta)\geq\frac{1}{I(\theta)}$
FII (Fisher information inequality) $\frac{1}{I(X+Y)}\geq\frac{1}{I(X)}+\frac{1}{I(Y)}$