Skip to Content
Advanced Algorithm

7. Markov Chain Monte Carlo Methods

2019-12-18Original-language archivelegacy assets may be incomplete

Monte Carlo Method

  • (P)Problem
    • Universe UU, subset SUS\subseteq U where ρ=SU\rho=\frac{|S|}{|U|}
    • Assume a uniform sampler for elements
    • Estimate Z=SZ=|S|
  • Monte Carlo Method (for estimating)
    • Sample X1,X2,,XNX_1,X_2,\cdots,X_N uniformly and independently from UU
    • Yi=[XiS]Y_i=[X_i\in S]
    • counting: Z^=UNi=1NYi\hat Z=\frac{|U|}{N}\sum_{i=1}^NY_i
  • ϵ\epsilon-approx estimator: (1ϵ)ZZ^(1+ϵ)Z(1-\epsilon)Z\leq\hat Z\leq(1+\epsilon)Z
  • Estimator Theorem(Naive): N4ϵ2ρln2δ=Θ(1ϵ2ρln1δ)P(Z^N\geq\frac{4}{\epsilon^2\rho}\ln\frac{2}{\delta}=\Theta(\frac{1}{\epsilon^2\rho}\ln\frac{1}{\delta})\Rightarrow P(\hat Z is ϵ\epsilon-approx of S)1δ|S|)\geq 1-\delta
  • Monte Carlo Method (for sampling)
    • rejection sampling: inefficient if ρ\rho is small

Counting DNF Solutions

  • (P)Counting DNF Solutions: #P\text{P}-hard
    • Input: DNF formula ϕ:{T,F}n{T,F},U={T,F}n\phi:\{T,F\}^n\rightarrow\{T,F\},U=\{T,F\}^n
    • Output: Z=ϕ1(T),S=ϕ1(T)Z=|\phi^{-1}(T)|,S=\phi^{-1}(T)
    • ρ=SU\rho=\frac{|S|}{|U|} can be exponentially small
  • (P)Union of Sets
    • Input: mm sets S1,S2,,SmS_1,S_2,\cdots,S_m, estimate i=1mSi|\bigcup_{i=1}^mS_i|
    • Output: i=1mSi|\bigcup_{i=1}^mS_i|
  • Coverage Algorithm: ρ1m\rho\geq \frac{1}{m}
    • U={(x,i)xSi}U=\{(x,i)|x\in S_i\} (multiset union)
    • S={(x,i)xSi and xSiii}\overline{S}=\{(x,i^*)|x\in S_{i^*}\text{ and }x\in S_i\Rightarrow i^*\leq i\}
    • Sample unifomr (x,i)U(x,i)\in U
      • Sample i{1,2,,m}i\in\{1,2,\cdots,m\}
      • Sample uniform xSix\in S_i
    • check if (x,i)S(x,i)\in\overline{S}
      • xSix\in S_i
      • and j<i,x∉Sj\forall j<i,x\not\in S_j
  • Counting by Coverage Algorithm: U=iSi|U|=\sum_i|S_i|
  • Sampling by Coverage Algorithm: Rejection Sampling

Markov Chain

{XttT},XtΩ\{X_t|t\in T\},X_t\in\Omega

  • discrete time: T={0,1,}T=\{0,1,\cdots\}
  • discrete space: Ω\Omega is finite
  • state XtΩX_t\in\Omega
  • chain: stochastic process in discrete time and discrete state space
  • Markov property(memoryless): Xt+1X_{t+1} depends only on XtX_t

P[Xt+1=yX0=x0,,Xt1=xt1,Xt=x]=P[Xt+1=yXt=x]P[X_{t+1}=y|X_0=x_0,\cdots,X_{t-1}=x_{t-1},X_t=x]=P[X_{t+1}=y|X_t=x]

  • Markov chain(memoryless): discrete time discrete space stochastic process with Markov property
  • homogeneity: transition does not depend on time
  • homogenous Markov chain M=(Ω,P)\mathfrak{M} = (\Omega,P): P[Xt+1=yXt=x]=PxyP[X_{t+1}=y|X_t=x]=P_{xy}
    • transition matrix PP over Ω×Ω\Omega\times\Omega
    • (row-)stochastic matrix P1=1P\mathbf{1}=\mathbf{1}
    • distribution p(t)(x)=P[Xt=x]p^{(t)}(x)=P[X_t=x]
    • p(t+1)=p(t)Pp^{(t+1)}=p^{(t)}P
  • stationary distribution π\pi of matrix PP: πP=π\pi P=\pi
  • Perron-Frobenius Theorem

Fundamental Theorem of Markov Chain

If a finite Markov chain is irreducible and ergodic(aperiodic), then \forall initial distribution π(0),limtπ(0)Pt=π\pi^{(0)},\lim_{t\rightarrow\infty}\pi^{(0)}P^t=\pi where π\pi is a unique stationary distribution

  • finite: the stationary distribution π\pi exists
  • irreducible: the stationary distribution is unique
    • all pairs of states communicate
    • Special case
      • not connected: π=λπA+(1λ)πB\pi=\lambda\pi_A+(1-\lambda)\pi_B
      • weak connected(absorbing case): π=(0,πB)\pi=(0,\pi_B)
  • aperiodicity: distribution converges to π\pi
    • period of state xx: dx=gcd{tPx,xt>0}d_x=\gcd\{t|P^t_{x,x}>0\}
    • aperiodic: all states have period 11
    • if xΩ,P(x,x)>0\forall x\in\Omega,P(x,x)>0(self-loop), then a chain is aperiodic
    • xx and yy communicate \Rightarrow dx=dyd_x=d_y

If Markov chain is infinite: positive recurrent

Random Walk

  • random walk: Markov Chain Sample on stationary distribution π\pi
  • Fundamental Theorem of Markov Chain on Graph
    • irreducible: GG is connected
    • aperiodic: GG is non-bipartite
  • uniform random walk (on undirected graph) M=(V,P)\mathfrak{M} = (V,P)
    • Strategy: At each step, uniformly at random move to a neighbor vv of current vertex
    • necessary condition for stationary distribution: π(u)=deg(u)2E\pi(u)=\frac{\text{deg(u)}}{2|E|}

P(u,v)={1deg(u)uvE0uv∉EP(u,v)=\begin{cases}\frac{1}{\text{deg}(u)} & uv\in E\newline 0&uv\not\in E\end{cases}

  • lazy random walk M=(V,12(I+P))\mathfrak{M} = (V,\frac{1}{2}(I+P))
    • Strategy: At each step, uniformly at random move to a neighbor vv of current vertex with probability 12\frac{1}{2}, otherwise do nothing
    • πP=ππ12(I+P)=π\pi P=\pi\Rightarrow \pi\frac{1}{2}(I+P)=\pi

P(u,v)={12u=v12deg(u)uvE0uv∉EP(u,v)=\begin{cases}\frac{1}{2} & u=v\newline \frac{1}{2\text{deg}(u)} & uv\in E\newline 0&uv\not\in E\end{cases}

PageRank

  • Rank r(v)r(v):importance of a page
    • pointed by more high-rank pages
    • high-rank page have greater influence
    • pages pointing to few others have greater influence
  • d+(u)d_+(u): out-degree
  • r(v)=u:(u,v)Er(u)d+(u)r(v)=\sum_{u:(u,v)\in E}\frac{r(u)}{d_+(u)}
  • random walk: P(u,v)=[(u,v)E]1d+(u)P(u,v)=[(u,v)\in E]\frac{1}{d_+(u)}
  • stationary distribution: rP=rrP=r

Reversibility

  • Detailed Balance Equation: π(x)P(x,y)=π(y)P(y,x)\pi(x)P(x,y)=\pi(y)P(y,x)
  • time-reversible Markov chain: π,x,yΩ,π(x)P(x,y)=π(y)P(y,x)\exists\pi,\forall x,y\in\Omega,\pi(x)P(x,y)=\pi(y)P(y,x)
    • time-reversible: when start from π\pi, (X0,X1,,Xn)(Xn,Xn1,,X0)(X_0,X_1,\cdots,X_n)\sim(X_n,X_{n-1},\cdots,X_0)
    • x,yΩ,π(x)P(x,y)=π(y)P(y,x)π\forall x,y\in\Omega,\pi(x)P(x,y)=\pi(y)P(y,x)\Rightarrow \pi is a stationary distribution
  • Analyze π\pi for a given PP (Random walk on graph)
    • u=v,u≁vu=v,u\not\sim v: DBE holds
    • uvu\sim v: π(u)1P(u,v)deg(u),Δ=maxvdeg(v)\pi(u)\propto\frac{1}{P(u,v)}\propto\text{deg}(u),\Delta=\max_v\text{deg}(v)
  • Design PP to make π\pi uniform distribution
P(u,v)={1deg(u)2Δu=v12ΔuvE0o.w.P(u,v)=\begin{cases} 1-\frac{\text{deg}(u)}{2\Delta} & u=v\newline \frac{1}{2\Delta} & uv\in E\newline 0 & o.w. \end{cases}

MCMC

  • Problem setting
    • Given a Markov chain with transition matrix QQ
    • Goal: a new Markov chain PP with stationary distribution π\pi
  • Detailed Balance Equation with acceptance ratio: π(i)Q(i,j)α(i,j)=π(j)Q(j,i)α(j,i)\pi(i)Q(i,j)\alpha(i,j)=\pi(j)Q(j,i)\alpha(j,i)
    • P(i,j)=Q(i,j)α(i,j)P(i,j)=Q(i,j)\alpha(i,j)
    • α(i,j)=π(j)Q(j,i)\alpha(i,j)=\pi(j)Q(j,i)
  • Original MCMC
    • (proposal) propose yΩy\in\Omega with probability Q(x,y),xQ(x,y),x is current state
    • (accept-reject sample) accept the proposal and move to yy with probability π(y)\pi(y)
    • run above TT times and return
  • mixing time TT: time to be close to the stationary distribution
    • 前沿研究

Metropolis Algorithm

  • Metroplis-Hastings Algorithm (symmetric case)
    • (proposal) propose yΩy\in\Omega with probability Q(x,y),xQ(x,y),x is current state
    • (filter) accept the proposal and move to yy with probability min{1,π(y)π(x)}\min\{1,\frac{\pi(y)}{\pi(x)}\}
  • New Transition Matrix (Meet Detailed Balance Equation)

P(x,y)={Q(x,y)min{1,π(x)π(y)}xy1yxP(x,y)x=yP(x,y)=\begin{cases} Q(x,y)\min\{1,\frac{\pi(x)}{\pi(y)}\}& x\neq y\newline 1-\sum_{y\neq x}P(x,y) & x=y\end{cases}

  • Metropolis Algorithm (M-H for sampling uniform random CSP solution)
    • Initially, start with an arbitrary CSP solution σ=(σ1,,σn)\sigma=(\sigma_1,\cdots,\sigma_n)
    • (proposal) pick a variable i[n]i\in[n] and value c[q]c\in[q] uniformly at random
    • (filter) accept the proposal and change σi\sigma_i to cc if it does not violate any constraint

Gibbs Sampling

  • For A(x1,y1),B(x1,y2)A(x_1,y_1),B(x_1,y_2), we have

π(A)π(y2x1)=π(x1,y1)π(y2x1)=π(x1)π(y1x1)π(y2x1)=π(x1,y2)π(y1x1)=π(B)π(y1x1) \pi(A)\pi(y_2|x_1)=\pi(x_1,y_1)\pi(y_2|x_1)=\pi(x_1)\pi(y_1|x_1)\pi(y_2|x_1)\newline =\pi(x_1,y_2)\pi(y_1|x_1)=\pi(B)\pi(y_1|x_1)

Let P(A,B)P(A,B) be the marginal distribution on their corrdinate of the same value, then DBE condition holds.

  • Goal: Sample a random vector X=(X1,,Xn)X=(X_1,\cdots,X_n) according to a joint distribution π(x1,,xn)\pi(x_1,\cdots,x_n)
  • Gibbs Sampling
    • Initially, start with an arbitrary possible XX
    • run following after TT steps:
      • pick a variable i[n]i\in[n] uniformly at random
      • resample XiX_i according to the marginal distribution of XiX_i conditioning on the current values of (X1,,Xi1,Xi+1,,Xn)(X_1,\cdots,X_{i-1},X_{i+1},\cdots,X_n)

Glauber Dynamics

  • Glauber Dynamics for CSP
    • Initially, start with an arbitrary CSP solution; at each step, the current CSP solution is σ\sigma
    • pick avarible i[n]i\in[n] uniformly at random
    • change value of σi\sigma_i to a uniform value cc among all σ\sigma's available values cc, namely changing σi\sigma_i to cc won't violate any constraint