Skip to Content
Information Theory

6. Differential Entropy

2020-12-24Original-language archivelegacy assets may be incomplete

Entropy

  • (continuous) XX with cumulative distribution function F(x)=Pr(Xx)F(x)=Pr(X\leq x)
  • support set of XX: f(x)>0f(x)>0
  • differential entropy h(x)h(x): h(X)=Sf(x)logf(x)dxh(X)=-\int_Sf(x)\log f(x)dx
    • h(X+c)=h(X)h(X+c) = h(X)
    • h(aX)=h(X)+logah(aX)=h(X)+\log|a|
    • h(AX)=h(X)+logdetAh(AX)=h(X)+\log|\det A|
    • h(X)h(X) may be negative (f(x)f(x) may >1>1)
  • uniform: h(X)=logah(X)=\log a
  • Gaussian: h(X)=12log2πeσ2h(X)=\frac{1}{2}\log 2\pi e\sigma^2
  • h(X)h(X): Infinite Information
    • does not serve as a measure of the average amount of information
  • h(X1,X2,,Xn)=f(xn)logf(xn)dxnh(X_1,X_2,\cdots,X_n)=-\int f(x^n)\log f(x^n)dx^n
  • h(XY)=f(x,y)logf(xy)dxdy)h(X|Y)=-\int f(x,y)\log f(x|y)dxdy)
  • Relative Entropy: D(fg)=flogfg0D(f\|g)=\int f\log\frac{f}{g}\geq 0
  • mutual information: I(X;Y)=f(x,y)logf(x,y)f(x)f(y)dxdy0I(X;Y)=\int f(x,y)\log\frac{f(x,y)}{f(x)f(y)}dxdy\geq 0

Relation to discrete

  • XΔ=xiX^\Delta=x_i if iΔx<(i+1)Δi\Delta\leq x<(i+1)\Delta
  • pi=Pr(XDelta=xi)=f(xi)Δp_i=Pr(X^{Delta}=x_i)=f(x_i)\Delta
  • H(XΔ)=Δf(xi)logf(xi)logΔH(X^{\Delta})=-\sum\Delta f(x_i)\log f(x_i)-\log \Delta
  • as Δ0,H(XΔ)+logΔh(f)=h(X)\Delta\rightarrow 0,H(X^\Delta)+\log \Delta\rightarrow h(f)=h(X)

AEP

  • 1nlogf(X1,X2,,Xn)E(logf(X))=h(f)-\frac{1}{n}\log f(X_1,X_2,\cdots,X_n)\rightarrow E(-\log f(X))=h(f)
  • Aϵ(n)={(x1,x2,,xn)Sn:1nlogf(x1,,xn)h(X)ϵ}A_\epsilon^{(n)}=\{(x_1,x_2,\cdots,x_n)\in S^n:|-\frac{1}{n}\log f(x_1,\cdots, x_n)-h(X)|\leq\epsilon\}
  • Vol(A)=Adx1dx2dxn\text{Vol}(A)=\int_Adx_1dx_2\cdots dx_n
  • Properties
    • Pr(Aϵ(n))>1ϵPr(A_\epsilon^{(n)})>1-\epsilon for nn sufficiently large
    • Vol(Aϵ(n))2n(h(X)+ϵ)\text{Vol}(A_\epsilon^{(n)})\leq 2^{n(h(X)+\epsilon)}
    • Vol(Aϵ(n))(1ϵ)2n(h(X)ϵ)\text{Vol}(A_\epsilon^{(n)})\geq (1-\epsilon)2^{n(h(X)-\epsilon)}

Covariance Matrix

  • cov(XX, YY)=E(XEX)(YEY)=E(XY)(EX)(EY)E(X-EX)(Y-EY)=E(XY)-(EX)(EY)
  • X\vec X: KX=E(XEX)(XEX)T=[cov(Xi;Xj)]K_X=E(X-EX)(X-EX)^T=[\text{cov}(X_i;X_j)]
  • correlation matrix: K~X=EXXT=[EXiXj]\widetilde K_X=EXX^T=[EX_iX_j]
    • symmetric and positive semidifinite
  • KX=K~X(EX)(EXT)K_X=\widetilde K_X-(EX)(EX^T)
  • Y=AXY=AX
    • KY=AKXATK_Y=AK_XA^T
    • K~Y=AK~XAT\widetilde K_Y=A\widetilde K_XA^T

Multivariate Normal Distribution

f(x)=1(2π)n2exp(12(xμ)TK1(xμ))f(x)=\frac{1}{(2\pi)^{\frac{n}{2}}}\exp(-\frac{1}{2}(x-\mu)^TK^{-1}(x-\mu))

  • uncorrelated then independent
  • h(X1,X2,,Xn)=h(N(μ,K))=12log(2πe)nKh(X_1,X_2,\cdots,X_n)=h(\mathcal{N}(\mu, K))=\frac{1}{2}\log(2\pi e)^n|K|
  • the mutual information between XX and YY is I(X;Y)=supP,QI([X]P;[Y]Q)I(X;Y)=\sup_{P,Q}I([X]_P;[Y]_Q) over all finite partitions PP and QQ
  • Correlatetd Gaussian (X,Y)N(0,K)(X,Y)\sim\mathcal{N}(0,K) K=[σ2ρσ2ρσ2σ2]K=\begin{bmatrix}\sigma^2 & \rho\sigma^2\\\rho \sigma^2 & \sigma^2\end{bmatrix}

Maximum Entropy

  • XRX\in R have mean μ\mu and variance σ2\sigma^2, then h(X)12log2πeσ2h(X)\leq\frac{1}{2}\log 2\pi e\sigma^2 with equality iff XN(μ,σ2)X\sim\mathcal{N}(\mu, \sigma^2)
  • XRX\in R that EX2σ2EX^2\leq \sigma^2, then h(X)12log2πeσ2h(X)\leq\frac{1}{2}\log 2\pi e\sigma^2
  • Problem: find density ff over SS meeting moment constraints α1,,αm\alpha_1,\cdots,\alpha_m
    • f(x)0f(x)\geq 0
    • Sf(x)dx=1\int_S f(x)dx=1
    • Sf(x)ri(x)dx=αi\int_S f(x)r_i(x)dx=\alpha_i
  • Maximum entropy distribution: f(x)=fλ(x)=eλ0+i=1mλiri(x)f^*(x)=f_\lambda(x)=e^{\lambda_0+\sum_{i=1}^m\lambda_ir_i(x)}
    • S=[a,b]S=[a,b] with no other constraints: uniform distributioni over this range
    • S=[0,),EX=μS=[0,\infty), EX=\mu, then f(x)=1μexμf(x)=\frac{1}{\mu}e^{-\frac{x}{\mu}}
    • S=(,),EX=α1,EX2=α2S=(-\infty, \infty), EX=\alpha_1,EX^2=\alpha_2, then N(α1,α2α12)\mathcal{N}(\alpha_1,\alpha_2-\alpha_1^2)

Inequality

Hadamard's Inequality

  • KK is a nonnegative definite symmetric n×nn\times n matrix
  • (Hadamard) KKii|K|\leq\prod K_{ii} with equality iff Kij=0,ijK_{ij}=0,i\neq j

Balanced Information Inequailty

  • h(X,Y)h(X)+h(Y)h(X,Y)\leq h(X)+h(Y)
  • neither h(X,Y)h(X)h(X,Y)\geq h(X) nor h(X,Y)h(X)h(X,Y)\leq h(X)
  • [n]={1,2,,n}[n]=\{1,2,\cdots,n\}, for α[n]\alpha\subset[n], Xα=(Xi:iα)X_\alpha=(X_i:i\in\alpha)
  • linear continous inequality αwαh(Xα)0\sum_\alpha w_\alpha h(X_\alpha)\geq 0 is valid iff its corresponding discrete counterpart αwαH(Xα)0\sum_\alpha w_\alpha H(X_\alpha)\geq 0 is valid and balanced

Han's Inequality

  • hk(n)=1(nk)S:S=kh(X(S))kh_k^{(n)}=\frac{1}{\binom{n}{k}}\sum_{S:|S|=k}\frac{h(X(S))}{k}
  • gk(n)=1(nk)S:S=kh(X(S))X)(Sc)kg_k^{(n)}=\frac{1}{\binom{n}{k}}\sum_{S:|S|=k}\frac{h(X(S))|X)(S^c)}{k}
  • Han's Inequality: h1(n)h2(n)hn(n)=H(X1,,Xn)/n=gn(n)g2(n)g1(n)h_1^{(n)}\geq h_2^{(n)}\geq\cdots\geq h_n^{(n)}=H(X_1,\cdots,X_n)/n=g_n^{(n)}\geq\cdots\geq g_2^{(n)}\geq g_1^{(n)}

Information of Heat

  • Heat equation (Fourier, 热传导方程): xx is position and tt is time, tf(x,t)=122x2f(x,t)\frac{\partial}{\partial t}f(x, t)=\frac{1}{2}\frac{\partial^2}{\partial x^2}f(x,t)
  • Yt=X+tZ,ZN(0,1)Y_t=X+\sqrt{t}Z,Z\sim\mathcal{N}(0,1), then f(y;t)=f(x)12πte(yx)22tdxf(y;t)=\int f(x)\frac{1}{\sqrt{2\pi t}}e^{-\frac{(y-x)^2}{2t}}dx
  • Gaussian channel -- Heat Equaition
  • Fisher Information: I(X)=+f(x)[xf(x)f(x)]2dxI(X)=\int_{-\infty}^{+\infty}f(x)[\frac{\frac{\partial}{\partial x}f(x)}{f(x)}]^2dx
  • De Bruijn's Identity: th(Yt)=12I(Yt)\frac{\partial}{\partial t}h(Y_t)=\frac{1}{2}I(Y_t)

Entropy power inequality

  • EPI (Entropy power inequality) e2nh(X+Y)e2nh(X)+e2nh(Y)e^{\frac{2}{n}h(X+Y)}\geq e^{\frac{2}{n}h(X)}+e^{\frac{2}{n}h(Y)} "最为强悍的工具"
    • Uncertainty principle
    • Young's inequality
    • Nash's inequality
    • Cramer-Rao bound: V(θ^)1I(θ)V(\hat \theta)\geq\frac{1}{I(\theta)}
  • FII (Fisher information inequality) 1I(X+Y)1I(X)+1I(Y)\frac{1}{I(X+Y)}\geq\frac{1}{I(X)}+\frac{1}{I(Y)}