Monte Carlo Method
(P)Problem
Universe U U U , subset S ⊆ U S\subseteq U S ⊆ U where ρ = ∣ S ∣ ∣ U ∣ \rho=\frac{|S|}{|U|} ρ = ∣ U ∣ ∣ S ∣
Assume a uniform sampler for elements
Estimate Z = ∣ S ∣ Z=|S| Z = ∣ S ∣
Monte Carlo Method (for estimating)
Sample X 1 , X 2 , ⋯ , X N X_1,X_2,\cdots,X_N X 1 , X 2 , ⋯ , X N uniformly and independently from U U U
Y i = [ X i ∈ S ] Y_i=[X_i\in S] Y i = [ X i ∈ S ]
counting: Z ^ = ∣ U ∣ N ∑ i = 1 N Y i \hat Z=\frac{|U|}{N}\sum_{i=1}^NY_i Z ^ = N ∣ U ∣ ∑ i = 1 N Y i
ϵ \epsilon ϵ -approx estimator: ( 1 − ϵ ) Z ≤ Z ^ ≤ ( 1 + ϵ ) Z (1-\epsilon)Z\leq\hat Z\leq(1+\epsilon)Z ( 1 − ϵ ) Z ≤ Z ^ ≤ ( 1 + ϵ ) Z
Estimator Theorem(Naive): N ≥ 4 ϵ 2 ρ ln 2 δ = Θ ( 1 ϵ 2 ρ ln 1 δ ) ⇒ P ( Z ^ N\geq\frac{4}{\epsilon^2\rho}\ln\frac{2}{\delta}=\Theta(\frac{1}{\epsilon^2\rho}\ln\frac{1}{\delta})\Rightarrow P(\hat Z N ≥ ϵ 2 ρ 4 ln δ 2 = Θ ( ϵ 2 ρ 1 ln δ 1 ) ⇒ P ( Z ^ is ϵ \epsilon ϵ -approx of ∣ S ∣ ) ≥ 1 − δ |S|)\geq 1-\delta ∣ S ∣ ) ≥ 1 − δ
Monte Carlo Method (for sampling)
rejection sampling: inefficient if ρ \rho ρ is small
Counting DNF Solutions
(P)Counting DNF Solutions: #P \text{P} P -hard
Input: DNF formula ϕ : { T , F } n → { T , F } , U = { T , F } n \phi:\{T,F\}^n\rightarrow\{T,F\},U=\{T,F\}^n ϕ : { T , F } n → { T , F } , U = { T , F } n
Output: Z = ∣ ϕ − 1 ( T ) ∣ , S = ϕ − 1 ( T ) Z=|\phi^{-1}(T)|,S=\phi^{-1}(T) Z = ∣ ϕ − 1 ( T ) ∣ , S = ϕ − 1 ( T )
ρ = ∣ S ∣ ∣ U ∣ \rho=\frac{|S|}{|U|} ρ = ∣ U ∣ ∣ S ∣ can be exponentially small
(P)Union of Sets
Input: m m m sets S 1 , S 2 , ⋯ , S m S_1,S_2,\cdots,S_m S 1 , S 2 , ⋯ , S m , estimate ∣ ⋃ i = 1 m S i ∣ |\bigcup_{i=1}^mS_i| ∣ ⋃ i = 1 m S i ∣
Output: ∣ ⋃ i = 1 m S i ∣ |\bigcup_{i=1}^mS_i| ∣ ⋃ i = 1 m S i ∣
Coverage Algorithm: ρ ≥ 1 m \rho\geq \frac{1}{m} ρ ≥ m 1
U = { ( x , i ) ∣ x ∈ S i } U=\{(x,i)|x\in S_i\} U = {( x , i ) ∣ x ∈ S i } (multiset union)
S ‾ = { ( x , i ∗ ) ∣ x ∈ S i ∗ and x ∈ S i ⇒ i ∗ ≤ i } \overline{S}=\{(x,i^*)|x\in S_{i^*}\text{ and }x\in S_i\Rightarrow i^*\leq i\} S = {( x , i ∗ ) ∣ x ∈ S i ∗ and x ∈ S i ⇒ i ∗ ≤ i }
Sample unifomr ( x , i ) ∈ U (x,i)\in U ( x , i ) ∈ U
Sample i ∈ { 1 , 2 , ⋯ , m } i\in\{1,2,\cdots,m\} i ∈ { 1 , 2 , ⋯ , m }
Sample uniform x ∈ S i x\in S_i x ∈ S i
check if ( x , i ) ∈ S ‾ (x,i)\in\overline{S} ( x , i ) ∈ S
x ∈ S i x\in S_i x ∈ S i
and ∀ j < i , x ∉ S j \forall j<i,x\not\in S_j ∀ j < i , x ∈ S j
Counting by Coverage Algorithm: ∣ U ∣ = ∑ i ∣ S i ∣ |U|=\sum_i|S_i| ∣ U ∣ = ∑ i ∣ S i ∣
Sampling by Coverage Algorithm: Rejection Sampling
Markov Chain
{ X t ∣ t ∈ T } , X t ∈ Ω \{X_t|t\in T\},X_t\in\Omega { X t ∣ t ∈ T } , X t ∈ Ω
discrete time: T = { 0 , 1 , ⋯ } T=\{0,1,\cdots\} T = { 0 , 1 , ⋯ }
discrete space: Ω \Omega Ω is finite
state X t ∈ Ω X_t\in\Omega X t ∈ Ω
chain: stochastic process in discrete time and discrete state space
Markov property(memoryless): X t + 1 X_{t+1} X t + 1 depends only on X t X_t X t
P [ X t + 1 = y ∣ X 0 = x 0 , ⋯ , X t − 1 = x t − 1 , X t = x ] = P [ X t + 1 = y ∣ X t = x ] P[X_{t+1}=y|X_0=x_0,\cdots,X_{t-1}=x_{t-1},X_t=x]=P[X_{t+1}=y|X_t=x] P [ X t + 1 = y ∣ X 0 = x 0 , ⋯ , X t − 1 = x t − 1 , X t = x ] = P [ X t + 1 = y ∣ X t = x ]
Markov chain(memoryless): discrete time discrete space stochastic process with Markov property
homogeneity: transition does not depend on time
homogenous Markov chain M = ( Ω , P ) \mathfrak{M} = (\Omega,P) M = ( Ω , P ) : P [ X t + 1 = y ∣ X t = x ] = P x y P[X_{t+1}=y|X_t=x]=P_{xy} P [ X t + 1 = y ∣ X t = x ] = P x y
transition matrix P P P over Ω × Ω \Omega\times\Omega Ω × Ω
(row-)stochastic matrix P 1 = 1 P\mathbf{1}=\mathbf{1} P 1 = 1
distribution p ( t ) ( x ) = P [ X t = x ] p^{(t)}(x)=P[X_t=x] p ( t ) ( x ) = P [ X t = x ]
p ( t + 1 ) = p ( t ) P p^{(t+1)}=p^{(t)}P p ( t + 1 ) = p ( t ) P
stationary distribution π \pi π of matrix P P P : π P = π \pi P=\pi π P = π
Perron-Frobenius Theorem
Fundamental Theorem of Markov Chain
If a finite Markov chain is irreducible and ergodic (aperiodic), then ∀ \forall ∀ initial distribution π ( 0 ) , lim t → ∞ π ( 0 ) P t = π \pi^{(0)},\lim_{t\rightarrow\infty}\pi^{(0)}P^t=\pi π ( 0 ) , lim t → ∞ π ( 0 ) P t = π where π \pi π is a unique stationary distribution
finite : the stationary distribution π \pi π exists
irreducible : the stationary distribution is unique
all pairs of states communicate
Special case
not connected: π = λ π A + ( 1 − λ ) π B \pi=\lambda\pi_A+(1-\lambda)\pi_B π = λ π A + ( 1 − λ ) π B
weak connected(absorbing case): π = ( 0 , π B ) \pi=(0,\pi_B) π = ( 0 , π B )
aperiodicity : distribution converges to π \pi π
period of state x x x : d x = gcd { t ∣ P x , x t > 0 } d_x=\gcd\{t|P^t_{x,x}>0\} d x = g cd{ t ∣ P x , x t > 0 }
aperiodic: all states have period 1 1 1
if ∀ x ∈ Ω , P ( x , x ) > 0 \forall x\in\Omega,P(x,x)>0 ∀ x ∈ Ω , P ( x , x ) > 0 (self-loop), then a chain is aperiodic
x x x and y y y communicate ⇒ \Rightarrow ⇒ d x = d y d_x=d_y d x = d y
If Markov chain is infinite: positive recurrent
Random Walk
random walk: Markov Chain Sample on stationary distribution π \pi π
Fundamental Theorem of Markov Chain on Graph
irreducible: G G G is connected
aperiodic: G G G is non-bipartite
uniform random walk (on undirected graph) M = ( V , P ) \mathfrak{M} = (V,P) M = ( V , P )
Strategy: At each step, uniformly at random move to a neighbor v v v of current vertex
necessary condition for stationary distribution: π ( u ) = deg(u) 2 ∣ E ∣ \pi(u)=\frac{\text{deg(u)}}{2|E|} π ( u ) = 2∣ E ∣ deg(u)
P ( u , v ) = { 1 deg ( u ) u v ∈ E 0 u v ∉ E P(u,v)=\begin{cases}\frac{1}{\text{deg}(u)} & uv\in E\newline 0&uv\not\in E\end{cases} P ( u , v ) = { deg ( u ) 1 0 uv ∈ E uv ∈ E
lazy random walk M = ( V , 1 2 ( I + P ) ) \mathfrak{M} = (V,\frac{1}{2}(I+P)) M = ( V , 2 1 ( I + P ))
Strategy: At each step, uniformly at random move to a neighbor v v v of current vertex with probability 1 2 \frac{1}{2} 2 1 , otherwise do nothing
π P = π ⇒ π 1 2 ( I + P ) = π \pi P=\pi\Rightarrow \pi\frac{1}{2}(I+P)=\pi π P = π ⇒ π 2 1 ( I + P ) = π
P ( u , v ) = { 1 2 u = v 1 2 deg ( u ) u v ∈ E 0 u v ∉ E P(u,v)=\begin{cases}\frac{1}{2} & u=v\newline \frac{1}{2\text{deg}(u)} & uv\in E\newline 0&uv\not\in E\end{cases} P ( u , v ) = ⎩ ⎨ ⎧ 2 1 2 deg ( u ) 1 0 u = v uv ∈ E uv ∈ E
Rank r ( v ) r(v) r ( v ) :importance of a page
pointed by more high-rank pages
high-rank page have greater influence
pages pointing to few others have greater influence
d + ( u ) d_+(u) d + ( u ) : out-degree
r ( v ) = ∑ u : ( u , v ) ∈ E r ( u ) d + ( u ) r(v)=\sum_{u:(u,v)\in E}\frac{r(u)}{d_+(u)} r ( v ) = ∑ u : ( u , v ) ∈ E d + ( u ) r ( u )
random walk: P ( u , v ) = [ ( u , v ) ∈ E ] 1 d + ( u ) P(u,v)=[(u,v)\in E]\frac{1}{d_+(u)} P ( u , v ) = [( u , v ) ∈ E ] d + ( u ) 1
stationary distribution: r P = r rP=r r P = r
Reversibility
Detailed Balance Equation: π ( x ) P ( x , y ) = π ( y ) P ( y , x ) \pi(x)P(x,y)=\pi(y)P(y,x) π ( x ) P ( x , y ) = π ( y ) P ( y , x )
time-reversible Markov chain: ∃ π , ∀ x , y ∈ Ω , π ( x ) P ( x , y ) = π ( y ) P ( y , x ) \exists\pi,\forall x,y\in\Omega,\pi(x)P(x,y)=\pi(y)P(y,x) ∃ π , ∀ x , y ∈ Ω , π ( x ) P ( x , y ) = π ( y ) P ( y , x )
time-reversible: when start from π \pi π , ( X 0 , X 1 , ⋯ , X n ) ∼ ( X n , X n − 1 , ⋯ , X 0 ) (X_0,X_1,\cdots,X_n)\sim(X_n,X_{n-1},\cdots,X_0) ( X 0 , X 1 , ⋯ , X n ) ∼ ( X n , X n − 1 , ⋯ , X 0 )
∀ x , y ∈ Ω , π ( x ) P ( x , y ) = π ( y ) P ( y , x ) ⇒ π \forall x,y\in\Omega,\pi(x)P(x,y)=\pi(y)P(y,x)\Rightarrow \pi ∀ x , y ∈ Ω , π ( x ) P ( x , y ) = π ( y ) P ( y , x ) ⇒ π is a stationary distribution
Analyze π \pi π for a given P P P (Random walk on graph)
u = v , u ≁ v u=v,u\not\sim v u = v , u ∼ v : DBE holds
u ∼ v u\sim v u ∼ v : π ( u ) ∝ 1 P ( u , v ) ∝ deg ( u ) , Δ = max v deg ( v ) \pi(u)\propto\frac{1}{P(u,v)}\propto\text{deg}(u),\Delta=\max_v\text{deg}(v) π ( u ) ∝ P ( u , v ) 1 ∝ deg ( u ) , Δ = max v deg ( v )
Design P P P to make π \pi π uniform distribution
P ( u , v ) = { 1 − deg ( u ) 2 Δ u = v 1 2 Δ u v ∈ E 0 o . w . P(u,v)=\begin{cases}
1-\frac{\text{deg}(u)}{2\Delta} & u=v\newline
\frac{1}{2\Delta} & uv\in E\newline
0 & o.w.
\end{cases} P ( u , v ) = ⎩ ⎨ ⎧ 1 − 2Δ deg ( u ) 2Δ 1 0 u = v uv ∈ E o . w .
MCMC
Problem setting
Given a Markov chain with transition matrix Q Q Q
Goal: a new Markov chain P P P with stationary distribution π \pi π
Detailed Balance Equation with acceptance ratio: π ( i ) Q ( i , j ) α ( i , j ) = π ( j ) Q ( j , i ) α ( j , i ) \pi(i)Q(i,j)\alpha(i,j)=\pi(j)Q(j,i)\alpha(j,i) π ( i ) Q ( i , j ) α ( i , j ) = π ( j ) Q ( j , i ) α ( j , i )
P ( i , j ) = Q ( i , j ) α ( i , j ) P(i,j)=Q(i,j)\alpha(i,j) P ( i , j ) = Q ( i , j ) α ( i , j )
α ( i , j ) = π ( j ) Q ( j , i ) \alpha(i,j)=\pi(j)Q(j,i) α ( i , j ) = π ( j ) Q ( j , i )
Original MCMC
(proposal) propose y ∈ Ω y\in\Omega y ∈ Ω with probability Q ( x , y ) , x Q(x,y),x Q ( x , y ) , x is current state
(accept-reject sample) accept the proposal and move to y y y with probability π ( y ) \pi(y) π ( y )
run above T T T times and return
mixing time T T T : time to be close to the stationary distribution
Metropolis Algorithm
Metroplis-Hastings Algorithm (symmetric case)
(proposal) propose y ∈ Ω y\in\Omega y ∈ Ω with probability Q ( x , y ) , x Q(x,y),x Q ( x , y ) , x is current state
(filter) accept the proposal and move to y y y with probability min { 1 , π ( y ) π ( x ) } \min\{1,\frac{\pi(y)}{\pi(x)}\} min { 1 , π ( x ) π ( y ) }
New Transition Matrix (Meet Detailed Balance Equation)
P ( x , y ) = { Q ( x , y ) min { 1 , π ( x ) π ( y ) } x ≠ y 1 − ∑ y ≠ x P ( x , y ) x = y P(x,y)=\begin{cases} Q(x,y)\min\{1,\frac{\pi(x)}{\pi(y)}\}& x\neq y\newline 1-\sum_{y\neq x}P(x,y) & x=y\end{cases} P ( x , y ) = { Q ( x , y ) min { 1 , π ( y ) π ( x ) } 1 − ∑ y = x P ( x , y ) x = y x = y
Metropolis Algorithm (M-H for sampling uniform random CSP solution)
Initially, start with an arbitrary CSP solution σ = ( σ 1 , ⋯ , σ n ) \sigma=(\sigma_1,\cdots,\sigma_n) σ = ( σ 1 , ⋯ , σ n )
(proposal) pick a variable i ∈ [ n ] i\in[n] i ∈ [ n ] and value c ∈ [ q ] c\in[q] c ∈ [ q ] uniformly at random
(filter) accept the proposal and change σ i \sigma_i σ i to c c c if it does not violate any constraint
Gibbs Sampling
For A ( x 1 , y 1 ) , B ( x 1 , y 2 ) A(x_1,y_1),B(x_1,y_2) A ( x 1 , y 1 ) , B ( x 1 , y 2 ) , we have
π ( A ) π ( y 2 ∣ x 1 ) = π ( x 1 , y 1 ) π ( y 2 ∣ x 1 ) = π ( x 1 ) π ( y 1 ∣ x 1 ) π ( y 2 ∣ x 1 ) = π ( x 1 , y 2 ) π ( y 1 ∣ x 1 ) = π ( B ) π ( y 1 ∣ x 1 ) \pi(A)\pi(y_2|x_1)=\pi(x_1,y_1)\pi(y_2|x_1)=\pi(x_1)\pi(y_1|x_1)\pi(y_2|x_1)\newline =\pi(x_1,y_2)\pi(y_1|x_1)=\pi(B)\pi(y_1|x_1) π ( A ) π ( y 2 ∣ x 1 ) = π ( x 1 , y 1 ) π ( y 2 ∣ x 1 ) = π ( x 1 ) π ( y 1 ∣ x 1 ) π ( y 2 ∣ x 1 ) = π ( x 1 , y 2 ) π ( y 1 ∣ x 1 ) = π ( B ) π ( y 1 ∣ x 1 )
Let P ( A , B ) P(A,B) P ( A , B ) be the marginal distribution on their corrdinate of the same value, then DBE condition holds.
Goal: Sample a random vector X = ( X 1 , ⋯ , X n ) X=(X_1,\cdots,X_n) X = ( X 1 , ⋯ , X n ) according to a joint distribution π ( x 1 , ⋯ , x n ) \pi(x_1,\cdots,x_n) π ( x 1 , ⋯ , x n )
Gibbs Sampling
Initially, start with an arbitrary possible X X X
run following after T T T steps:
pick a variable i ∈ [ n ] i\in[n] i ∈ [ n ] uniformly at random
resample X i X_i X i according to the marginal distribution of X i X_i X i conditioning on the current values of ( X 1 , ⋯ , X i − 1 , X i + 1 , ⋯ , X n ) (X_1,\cdots,X_{i-1},X_{i+1},\cdots,X_n) ( X 1 , ⋯ , X i − 1 , X i + 1 , ⋯ , X n )
Glauber Dynamics
Glauber Dynamics for CSP
Initially, start with an arbitrary CSP solution; at each step, the current CSP solution is σ \sigma σ
pick avarible i ∈ [ n ] i\in[n] i ∈ [ n ] uniformly at random
change value of σ i \sigma_i σ i to a uniform value c c c among all σ \sigma σ 's available values c c c , namely changing σ i \sigma_i σ i to c c c won't violate any constraint