Entropy
- (continuous) X with cumulative distribution function F(x)=Pr(X≤x)
- support set of X: f(x)>0
- differential entropy h(x): h(X)=−∫Sf(x)logf(x)dx
- h(X+c)=h(X)
- h(aX)=h(X)+log∣a∣
- h(AX)=h(X)+log∣detA∣
- h(X) may be negative (f(x) may >1)
- uniform: h(X)=loga
- Gaussian: h(X)=21log2πeσ2
- h(X): Infinite Information
- does not serve as a measure of the average amount of information
- h(X1,X2,⋯,Xn)=−∫f(xn)logf(xn)dxn
- h(X∣Y)=−∫f(x,y)logf(x∣y)dxdy)
- Relative Entropy: D(f∥g)=∫floggf≥0
- mutual information: I(X;Y)=∫f(x,y)logf(x)f(y)f(x,y)dxdy≥0
Relation to discrete
- XΔ=xi if iΔ≤x<(i+1)Δ
- pi=Pr(XDelta=xi)=f(xi)Δ
- H(XΔ)=−∑Δf(xi)logf(xi)−logΔ
- as Δ→0,H(XΔ)+logΔ→h(f)=h(X)
AEP
- −n1logf(X1,X2,⋯,Xn)→E(−logf(X))=h(f)
- Aϵ(n)={(x1,x2,⋯,xn)∈Sn:∣−n1logf(x1,⋯,xn)−h(X)∣≤ϵ}
- Vol(A)=∫Adx1dx2⋯dxn
- Properties
- Pr(Aϵ(n))>1−ϵ for n sufficiently large
- Vol(Aϵ(n))≤2n(h(X)+ϵ)
- Vol(Aϵ(n))≥(1−ϵ)2n(h(X)−ϵ)
Covariance Matrix
- cov(X, Y)=E(X−EX)(Y−EY)=E(XY)−(EX)(EY)
- X: KX=E(X−EX)(X−EX)T=[cov(Xi;Xj)]
- correlation matrix: KX=EXXT=[EXiXj]
- symmetric and positive semidifinite
- KX=KX−(EX)(EXT)
- Y=AX
- KY=AKXAT
- KY=AKXAT
Multivariate Normal Distribution
f(x)=(2π)2n1exp(−21(x−μ)TK−1(x−μ))
- uncorrelated then independent
- h(X1,X2,⋯,Xn)=h(N(μ,K))=21log(2πe)n∣K∣
- the mutual information between X and Y is I(X;Y)=supP,QI([X]P;[Y]Q) over all finite partitions P and Q
- Correlatetd Gaussian (X,Y)∼N(0,K)
K=[σ2ρσ2ρσ2σ2]
Maximum Entropy
- X∈R have mean μ and variance σ2, then h(X)≤21log2πeσ2 with equality iff X∼N(μ,σ2)
- X∈R that EX2≤σ2, then h(X)≤21log2πeσ2
- Problem: find density f over S meeting moment constraints α1,⋯,αm
- f(x)≥0
- ∫Sf(x)dx=1
- ∫Sf(x)ri(x)dx=αi
- Maximum entropy distribution: f∗(x)=fλ(x)=eλ0+∑i=1mλiri(x)
- S=[a,b] with no other constraints: uniform distributioni over this range
- S=[0,∞),EX=μ, then f(x)=μ1e−μx
- S=(−∞,∞),EX=α1,EX2=α2, then N(α1,α2−α12)
Inequality
Hadamard's Inequality
- K is a nonnegative definite symmetric n×n matrix
- (Hadamard) ∣K∣≤∏Kii with equality iff Kij=0,i=j
- h(X,Y)≤h(X)+h(Y)
- neither h(X,Y)≥h(X) nor h(X,Y)≤h(X)
- [n]={1,2,⋯,n}, for α⊂[n], Xα=(Xi:i∈α)
- linear continous inequality ∑αwαh(Xα)≥0 is valid iff its corresponding discrete counterpart ∑αwαH(Xα)≥0 is valid and balanced
Han's Inequality
- hk(n)=(kn)1∑S:∣S∣=kkh(X(S))
- gk(n)=(kn)1∑S:∣S∣=kkh(X(S))∣X)(Sc)
- Han's Inequality: h1(n)≥h2(n)≥⋯≥hn(n)=H(X1,⋯,Xn)/n=gn(n)≥⋯≥g2(n)≥g1(n)
- Heat equation (Fourier, 热传导方程): x is position and t is time, ∂t∂f(x,t)=21∂x2∂2f(x,t)
- Yt=X+tZ,Z∼N(0,1), then f(y;t)=∫f(x)2πt1e−2t(y−x)2dx
- Gaussian channel -- Heat Equaition
- Fisher Information: I(X)=∫−∞+∞f(x)[f(x)∂x∂f(x)]2dx
- De Bruijn's Identity: ∂t∂h(Yt)=21I(Yt)
Entropy power inequality
- EPI (Entropy power inequality) en2h(X+Y)≥en2h(X)+en2h(Y) "最为强悍的工具"
- Uncertainty principle
- Young's inequality
- Nash's inequality
- Cramer-Rao bound: V(θ^)≥I(θ)1
- FII (Fisher information inequality) I(X+Y)1≥I(X)1+I(Y)1