Skip to Content
Information Theory

1. Entropy

2020-12-22Original-language archivelegacy assets may be incomplete

Preliminary

  • A mathematical theory of communicationo - Shannon 1948
  • Convexity: f(ipixi)ipif(xi)f(\sum_ip_ix_i)\leq \sum_ip_if(x_i)
    • f(Epxi)Epf(xi)f(E_p x_i)\leq E_pf(x_i)
    • convex: f(x)0f''(x)\geq 0
  • f(x)=xlogxf(x)=-x\log x is concave

Entropy

Entropy: H(X)=xXp(x)logp(x)=Eplog1p(X)H(X)=-\sum_{x\in\mathcal{X}}p(x)\log p(x)=E_{p}\log\frac{1}{p(X)}

  • 0log000\log 0\rightarrow 0
  • 0H(X)logX0\leq H(X)\leq \log|\mathcal{X}|
  • uniform XX: H(X)=logXH(X)=\log|\mathcal{X}|
  • Hb(X)=logbaHa(X)H_b(X)=\log_baH_a(X)

Joint Entropy: H(X,Y)=Elogp(X,Y)=xXyYp(x,y)logp(x,y)H(X,Y)=-E\log p(X,Y)=-\sum_{x\in\mathcal{X}}\sum_{y\in\mathcal{Y}}p(x,y)\log p(x,y)

  • H(X,X)=H(X)H(X,X) = H(X)
  • H(X,Y)=H(Y,X)H(X,Y) = H(Y,X)

Conditional Entropy: H(YX)=xXp(x)H(YX=x)=xXyYp(x,y)logp(yx)=Elogp(yx)H(Y|X)=\sum_{x\in\mathcal{X}}p(x)H(Y|X=x)=-\sum_{x\in\mathcal{X}}\sum_{y\in\mathcal{Y}}p(x,y)\log p(y|x)=-E\log p(y|x)

  • H(YX)H(X)H(Y|X)\leq H(X)
  • remaining uncertainty when XX is known
  • H(XY)H(YX)H(X|Y)\neq H(Y|X)
  • H(XY)+H(Y)=H(YX)+H(X)=H(X,Y)H(X|Y)+H(Y)=H(Y|X)+H(X)=H(X,Y)
  • Chain rule: H(X1,X2,,Xn)=H(X1)+H(X2X1)++H(XnXn1,,X1)H(X_1,X_2,\cdots,X_n)=H(X_1)+H(X_2|X_1)+\cdots+H(X_n|X_{n-1},\cdots,X_1)
  • Zero conditional entropy: if H(YX)=0,Y=f(X)H(Y|X)=0,Y=f(X)

Relative Entropy (Kullback-Leibler distance): D(pq)=xXp(x)logp(x)q(x)=Eplogp(X)q(X)=Ep(logq(x))H(p)D(p\|q)=\sum_{x\in X}p(x)\log\frac{p(x)}{q(x)}=E_p\log\frac{p(X)}{q(X)}=E_p(-\log q(x))-H(p)

  • D(pq)0D(p\|q)\geq 0 equality iff p(x)=q(x)p(x)=q(x) (prove by concavity)
  • variational distance V(p,q)=xXp(x)q(x)V(p,q)=\sum_{x\in\mathcal{X}}|p(x)-q(x)|
  • Pinsker's inequlity: D(pq)12ln2V2(p,q)D(p\|q)\geq\frac{1}{2\ln 2}V^2(p,q)
  • D(pu)=logXH(X)D(p\|u)=\log|\mathcal{X}|-H(X)

Mutual Information: I(X;Y)=xyp(x,y)logp(x,y)p(x)p(y)=D(p(x,y)p(x)p(y))=Ep(x,y)logp(X,Y)p(X)p(Y)I(X;Y)=\sum_x\sum_yp(x,y)\log\frac{p(x,y)}{p(x)p(y)}=D(p(x,y)\|p(x)p(y))=E_{p(x,y)}\log\frac{p(X,Y)}{p(X)p(Y)}

  • Independent I(X;Y)=0I(X;Y)=0
  • I(X;X)=H(X)I(X;X)=H(X)
  • I(X;Y)=H(X)H(XY)=H(X)+H(Y)H(X,Y)I(X;Y)=H(X)-H(X|Y)=H(X)+H(Y)-H(X,Y)

Conditional Mutual Information: I(X;YZ)=H(XZ)H(XY,Z)=Ep(x,y,z)logp(X,YZ)p(XZ)p(YZ)I(X;Y|Z)=H(X|Z)-H(X|Y,Z)=E_{p(x,y,z)}\log\frac{p(X,Y|Z)}{p(X|Z)p(Y|Z)}

  • Chain rule: I(X1,X2,,Xn;Y)=i=1nI(Xi;YXi1,,X1)I(X_1,X_2,\cdots,X_n;Y)=\sum_{i=1}^nI(X_i;Y|X_{i-1},\cdots,X_1)

Conditional Relative Entropy: D(p(yx)q(yx))=Ep(x,y)logq(YX)q(YX)D(p(y|x)\|q(y|x))=E_{p(x,y)}\log \frac{q(Y|X)}{q(Y|X)}

Inequality

  • H(X1,X2,,Xn)i=1nH(Xi)H(X_1,X_2,\cdots,X_n)\leq\sum_{i=1}^nH(X_i) equality iff XiX_i are independent
  • I(X;YZ)I(X;Y|Z) and I(X;Y)I(X;Y)
    • XYZ:I(X;ZY)=0X\rightarrow Y\rightarrow Z:I(X;Z|Y)=0
    • Z=X+Ymod2:I(X;YZ)>I(X;Y)Z=X+Y\bmod 2:I(X;Y|Z)>I(X;Y)
  • Data Processing Inequality: XYZ,I(X;Y)I(X;Z)X\rightarrow Y\rightarrow Z,I(X;Y)\geq I(X;Z)
    • XYZ,I(X;Y)=I(X;Z)+I(X;YZ)X\rightarrow Y\rightarrow Z,I(X;Y)=I(X;Z)+I(X;Y|Z)
  • I(X;Y;Z)=I(X;Y)I(X;YZ)I(X;Y;Z)=I(X;Y)-I(X;Y|Z)
  • Perfect Secrecy: H(X)H(Z)H(X)\leq H(Z) 信息长度小于秘钥长度
  • Fano's Inequality: relationship between PeP_e and H(XY)H(X|Y)
    • XYX^,Pe=P(X^X)X\rightarrow Y\rightarrow \hat X,P_e=P(\hat X\neq X)
    • H(Pe)+PelogXH(XX^)H(XY)H(P_e)+P_e\log|\mathcal{X}|\geq H(X|\hat X)\geq H(X|Y)
    • weaken: PeH(XY)1logXP_e\geq\frac{H(X|Y)-1}{\log |\mathcal{X}|}
  • Log sum inequality: for nonnegative numbers, i=1nailogaibi(sumi=1nai)logi=1naii=1nbi\sum_{i=1}^na_i\log\frac{a_i}{b_i}\geq(sum_{i=1}^na_i)\log\frac{\sum_{i=1}^na_i}{\sum_{i=1}^nb_i}, equal iff aibi=c\frac{a_i}{b_i}=c
  • H(p)H(p) is concave of pp
  • D(pq)D(p\|q) is convex in pair (p,q)(p,q)