Preliminary
- A mathematical theory of communicationo - Shannon 1948
- Convexity: f(∑ipixi)≤∑ipif(xi)
- f(Epxi)≤Epf(xi)
- convex: f′′(x)≥0
- f(x)=−xlogx is concave
Entropy
Entropy: H(X)=−∑x∈Xp(x)logp(x)=Eplogp(X)1
- 0log0→0
- 0≤H(X)≤log∣X∣
- uniform X: H(X)=log∣X∣
- Hb(X)=logbaHa(X)
Joint Entropy: H(X,Y)=−Elogp(X,Y)=−∑x∈X∑y∈Yp(x,y)logp(x,y)
- H(X,X)=H(X)
- H(X,Y)=H(Y,X)
Conditional Entropy: H(Y∣X)=∑x∈Xp(x)H(Y∣X=x)=−∑x∈X∑y∈Yp(x,y)logp(y∣x)=−Elogp(y∣x)
- H(Y∣X)≤H(X)
- remaining uncertainty when X is known
- H(X∣Y)=H(Y∣X)
- H(X∣Y)+H(Y)=H(Y∣X)+H(X)=H(X,Y)
- Chain rule: H(X1,X2,⋯,Xn)=H(X1)+H(X2∣X1)+⋯+H(Xn∣Xn−1,⋯,X1)
- Zero conditional entropy: if H(Y∣X)=0,Y=f(X)
Relative Entropy (Kullback-Leibler distance): D(p∥q)=∑x∈Xp(x)logq(x)p(x)=Eplogq(X)p(X)=Ep(−logq(x))−H(p)
- D(p∥q)≥0 equality iff p(x)=q(x) (prove by concavity)
- variational distance V(p,q)=∑x∈X∣p(x)−q(x)∣
- Pinsker's inequlity: D(p∥q)≥2ln21V2(p,q)
- D(p∥u)=log∣X∣−H(X)
Mutual Information: I(X;Y)=∑x∑yp(x,y)logp(x)p(y)p(x,y)=D(p(x,y)∥p(x)p(y))=Ep(x,y)logp(X)p(Y)p(X,Y)
- Independent I(X;Y)=0
- I(X;X)=H(X)
- I(X;Y)=H(X)−H(X∣Y)=H(X)+H(Y)−H(X,Y)
Conditional Mutual Information: I(X;Y∣Z)=H(X∣Z)−H(X∣Y,Z)=Ep(x,y,z)logp(X∣Z)p(Y∣Z)p(X,Y∣Z)
- Chain rule: I(X1,X2,⋯,Xn;Y)=∑i=1nI(Xi;Y∣Xi−1,⋯,X1)
Conditional Relative Entropy: D(p(y∣x)∥q(y∣x))=Ep(x,y)logq(Y∣X)q(Y∣X)
Inequality
- H(X1,X2,⋯,Xn)≤∑i=1nH(Xi) equality iff Xi are independent
- I(X;Y∣Z) and I(X;Y)
- X→Y→Z:I(X;Z∣Y)=0
- Z=X+Ymod2:I(X;Y∣Z)>I(X;Y)
- Data Processing Inequality: X→Y→Z,I(X;Y)≥I(X;Z)
- X→Y→Z,I(X;Y)=I(X;Z)+I(X;Y∣Z)
- I(X;Y;Z)=I(X;Y)−I(X;Y∣Z)
- Perfect Secrecy: H(X)≤H(Z) 信息长度小于秘钥长度
- Fano's Inequality: relationship between Pe and H(X∣Y)
- X→Y→X^,Pe=P(X^=X)
- H(Pe)+Pelog∣X∣≥H(X∣X^)≥H(X∣Y)
- weaken: Pe≥log∣X∣H(X∣Y)−1
- Log sum inequality: for nonnegative numbers, ∑i=1nailogbiai≥(sumi=1nai)log∑i=1nbi∑i=1nai, equal iff biai=c
- H(p) is concave of p
- D(p∥q) is convex in pair (p,q)