Metric Embedding
Metric Space: ( X , d ) , X (X,d),X ( X , d ) , X is a set and d d d is a distance on X X X
Embedding: ϕ : X → Y \phi:X\rightarrow Y ϕ : X → Y on two metric space ( X , d X ) , ( Y , d Y ) (X,d_X),(Y,d_Y) ( X , d X ) , ( Y , d Y )
Distortion α ≥ 1 \alpha\geq 1 α ≥ 1 : ∀ x , y ∈ X , 1 α d ( x , y ) ≤ d ( ϕ ( x ) , ϕ ( y ) ) ≤ α d ( x , y ) \forall x,y\in X,\frac{1}{\alpha}d(x,y)\leq d(\phi(x),\phi(y))\leq\alpha d(x,y) ∀ x , y ∈ X , α 1 d ( x , y ) ≤ d ( ϕ ( x ) , ϕ ( y )) ≤ α d ( x , y )
Dimension Reduction: x 1 , ⋯ , x n ∈ R d , y 1 , ⋯ , y n ∈ R k x_1,\cdots,x_n\in\mathbb{R}^d,y_1,\cdots,y_n\in\mathbb{R}^k x 1 , ⋯ , x n ∈ R d , y 1 , ⋯ , y n ∈ R k
JLT Embedding
Johnson-Lindenstrauss Theorem(JLT, 1984): it is always possble to embed n n n points in arbitrary dimension to O ( log n ) O(\log n) O ( log n ) dimension with constant distortion in Euclidian Space
∀ 0 < ϵ < 1 , ∀ S ⊂ R d , ∣ S ∣ = n , ∃ k = O ( ϵ − 2 log n ) , ϕ : R d → R k \forall0<\epsilon <1,\forall S\subset \mathbb{R}^{d},|S|=n,\exists k=O(\epsilon ^{-2}\log n),\phi :\mathbb{R} ^{d}\rightarrow \mathbb{R}^{k} ∀0 < ϵ < 1 , ∀ S ⊂ R d , ∣ S ∣ = n , ∃ k = O ( ϵ − 2 log n ) , ϕ : R d → R k such that ∀ x , y ∈ S , ( 1 − ϵ ) ∥ x − y ∥ 2 ≤ ∥ ϕ ( x ) − ϕ ( y ) ∥ 2 ≤ ( 1 + ϵ ) ∥ x − y ∥ 2 \forall x,y\in S,(1-\epsilon )\|x-y\|^{2}\leq \|\phi (x)-\phi (y)\|^{2}\leq (1+\epsilon )\|x-y\|^{2} ∀ x , y ∈ S , ( 1 − ϵ ) ∥ x − y ∥ 2 ≤ ∥ ϕ ( x ) − ϕ ( y ) ∥ 2 ≤ ( 1 + ϵ ) ∥ x − y ∥ 2
(linear embedding): ϕ ( x ) = A x \phi(x)=Ax ϕ ( x ) = A x
(2016) Lower Bound: Ω ( ϵ − 2 log n ) \Omega(\epsilon^{-2}\log n) Ω ( ϵ − 2 log n )
Contruction
projection onto uniform random k k k -dimensional subspace of R d \mathbb{R}^d R d (1999)
random matrix with i.i.d. ± 1 \pm 1 ± 1 (2003)
random matrix with i.i.d. Gaussian entries (1998): A ∈ R k × d A\in\mathbb{R}^{k\times d} A ∈ R k × d is drawn from the Gaussian distribution N ( 0 , 1 k ) \mathcal{N}(0,\frac{1}{k}) N ( 0 , k 1 )
To prove: P ( ∣ ∥ A u ∥ 2 2 − 1 ∣ > ϵ ) < 1 n 3 P(|\| Au\|^2_2-1|>\epsilon)<\frac{1}{n^3} P ( ∣∥ A u ∥ 2 2 − 1∣ > ϵ ) < n 3 1
∥ A u ∥ 2 = ∑ i = 1 k Y i 2 , Y i ∼ N ( 0 , 1 k ) \lVert Au\rVert^2=\sum_{i=1}^kY_i^2,Y_i\sim\mathcal{N}(0,\frac{1}{k}) ∥ A u ∥ 2 = ∑ i = 1 k Y i 2 , Y i ∼ N ( 0 , k 1 )
Chernoff bound for χ 2 \chi^2 χ 2 -distribution
Nearest Neighbor Search(NNS)
Problem
NNS
Data: y 1 , ⋯ , y n ∈ X y_1,\cdots,y_n\in X y 1 , ⋯ , y n ∈ X
Query: x ∈ X x\in X x ∈ X
Answer: y i y_i y i closest to x x x
c c c -ANN: Approximate Nearest Neighbor
Answer: Find a y i y_i y i such that dist ( x , y i ) ≤ c min 1 ≤ j ≤ n dist ( x , y j ) \text{dist}(x,y_i)\leq c\min_{1\leq j\leq n}\text{dist}(x,y_j) dist ( x , y i ) ≤ c min 1 ≤ j ≤ n dist ( x , y j )
( c , r ) (c,r) ( c , r ) -ANN: Approximate Near Neighbor:
{ y i ∗ ∈ S , dist ( x , y i ∗ ) ≤ c r ∃ y i ∈ S , dist ( x , y i ) ≤ r ⊥ ∀ y i ∈ S , dist ( x , y i ) > r arbitrary o . w . \begin{cases}y_{i^*}\in S,\text{dist}(x,y_{i^*})\leq cr & \exists y_i\in S,\text{dist}(x,y_i)\leq r\newline \perp & \forall y_i\in S,\text{dist}(x,y_i)> r\newline \text{arbitrary} & o.w.\end{cases} ⎩ ⎨ ⎧ y i ∗ ∈ S , dist ( x , y i ∗ ) ≤ cr ⊥ arbitrary ∃ y i ∈ S , dist ( x , y i ) ≤ r ∀ y i ∈ S , dist ( x , y i ) > r o . w .
From ( c , r ) (c,r) ( c , r ) -ANN to c c c -ANN
Definition
D min = min dist ( y i , y j ) D_{\min}=\min\text{dist}(y_i,y_j) D m i n = min dist ( y i , y j )
D max = max dist ( y i , y j ) D_{\max}=\max\text{dist}(y_i,y_j) D m a x = max dist ( y i , y j )
R = D max D min R=\frac{D_{\max}}{D_{\min}} R = D m i n D m a x
∀ r , ∃ \forall r,\exists ∀ r , ∃ data structure for ( c , r ) (c,r) ( c , r ) -ANN
space s s s
answer time t t t
probability 1 − δ 1-\delta 1 − δ
∃ \exists ∃ data structure for r r r -ANN
space O ( s log c R ) O(s\log_c R) O ( s log c R )
answer time O ( t log log c R ) O(t\log\log_c R) O ( t log log c R )
probability 1 − O ( δ log log c R ) 1-O(\delta\log\log_c R) 1 − O ( δ log log c R )
Deterministic
Dictionary data structure
Curse of dimensionality
conjecture: NNS in high dimension requires either super-polynomial space or super-polynomial time
Dimension Reduction
Solve ( c , r ) (c,r) ( c , r ) -ANN in Hamming Space x ∈ { 0 , 1 } d , d > > log n x\in\{0,1\}^d,d>>\log n x ∈ { 0 , 1 } d , d >> log n w.h.p
Data Structure
random Boolean matrix A k × d A_{k\times d} A k × d with i.i.d. entries ∈ \in ∈ Bernoulli(p p p )
z i = A y i ∈ { 0 , 1 } k z_i=Ay_i\in\{0,1\}^k z i = A y i ∈ { 0 , 1 } k on finite field GF(2)
s s s -balls: B s ( u ) = { y i ∣ dist ( u , z i ) ≤ s } B_s(u)=\{y_i|\text{dist}(u,z_i)\leq s\} B s ( u ) = { y i ∣ dist ( u , z i ) ≤ s } for all u ∈ { 0 , 1 } k u\in\{0,1\}^k u ∈ { 0 , 1 } k (打表)
space: O ( n 2 k ) O(n2^k) O ( n 2 k )
Answer: any y i ∈ B s ( A x ) y_i\in B_s(Ax) y i ∈ B s ( A x ) (no if none)
query time: O ( k d ) + O ( 1 ) O(kd)+O(1) O ( k d ) + O ( 1 )
Decide k , p , s k,p,s k , p , s
k = ln n ( 1 8 − 2 − ( c + 2 ) ) 2 = O ( log n ) k=\frac{\ln n}{(\frac{1}{8}-2^{-(c+2)})^2}=O(\log n) k = ( 8 1 − 2 − ( c + 2 ) ) 2 l n n = O ( log n )
p = 1 2 − 2 − 1 − 1 / r p=\frac{1}{2}-2^{-1-1/r} p = 2 1 − 2 − 1 − 1/ r
s = ( 3 8 − 2 − ( c + 2 ) ) k s=(\frac{3}{8}-2^{-(c+2)})k s = ( 8 3 − 2 − ( c + 2 ) ) k
space: n O ( 1 ) n^{O(1)} n O ( 1 )
time: O ( d log n ) O(d\log n) O ( d log n )
Locality Sensitive Hashing
( r , c r , p , q ) (r,cr,p,q) ( r , cr , p , q ) -LSH: h : X → U h:X\rightarrow U h : X → U satisfying ∀ x , y ∈ X \forall x,y\in X ∀ x , y ∈ X
dist ( x , y ) ≤ r ⇒ P ( h ( x ) = h ( y ) ) ≥ p \text{dist}(x,y)\leq r\Rightarrow P(h(x)=h(y))\geq p dist ( x , y ) ≤ r ⇒ P ( h ( x ) = h ( y )) ≥ p
dist ( x , y ) > c r ⇒ P ( h ( x ) = h ( y ) ) ≤ q \text{dist}(x,y)>cr\Rightarrow P(h(x)=h(y))\leq q dist ( x , y ) > cr ⇒ P ( h ( x ) = h ( y )) ≤ q
∃ ( r , c r , p , q ) \exists (r,cr,p,q) ∃ ( r , cr , p , q ) -LSH ⇒ ∃ ( r , c r , p k , q k ) \Rightarrow\exists (r,cr,p^k,q^k) ⇒ ∃ ( r , cr , p k , q k ) -LSH
g ( x ) = ( h 1 ( x ) , ⋯ , h k ( x ) ) ∈ U k g(x)=(h_1(x),\cdots,h_k(x))\in U^k g ( x ) = ( h 1 ( x ) , ⋯ , h k ( x )) ∈ U k
k = log 1 q n , p k = n − log p log q , q = 1 n k=\log_{\frac{1}{q}}n,p^k={n^{-\frac{\log p}{\log q}}},q=\frac{1}{n} k = log q 1 n , p k = n − l o g q l o g p , q = n 1
with ( r , c r , p ∗ , 1 n ) (r,cr,p^*,\frac{1}{n}) ( r , cr , p ∗ , n 1 ) -LSH g g g
Algorithm 1:
space: O ( n ) O(n) O ( n )
store y 1 , ⋯ , y n y_1,\cdots,y_n y 1 , ⋯ , y n in nondecreasing order g ( y i ) g(y_i) g ( y i )
time: O ( log n ) + O ( 1 ) O(\log n)+O(1) O ( log n ) + O ( 1 ) in expectation
find all y i y_i y i that g ( x ) = g ( y i ) g(x)=g(y_i) g ( x ) = g ( y i ) and check dist ( x , y i ) \text{dist}(x,y_i) dist ( x , y i )
real answer "no": always correct
real answer not "no": ≥ p ∗ \geq p^* ≥ p ∗
Algorithm 2:
space: O ( n p ∗ ) O(\frac{n}{p^*}) O ( p ∗ n )
draw independent g 1 , ⋯ , g 1 p ∗ g_1,\cdots,g_{\frac{1}{p^*}} g 1 , ⋯ , g p ∗ 1
store y 1 , ⋯ , y n y_1,\cdots,y_n y 1 , ⋯ , y n in table-j j j in nondecreasing order of g j ( y i ) g_j(y_i) g j ( y i )
time: O ( log n p ∗ ) O(\frac{\log n}{p^*}) O ( p ∗ l o g n )
find ≤ 10 p ∗ \leq\frac{10}{p^*} ≤ p ∗ 10 number of y i , ∃ j , g j ( x ) = g j ( y i ) y_i,\exists j,g_j(x)=g_j(y_i) y i , ∃ j , g j ( x ) = g j ( y i )
real answer "no": always correct
real answer not "no": > 1 2 >\frac{1}{2} > 2 1
Overall: solve ( c , r ) (c,r) ( c , r ) -ANN with space O ( n 1 + log p log q ) O(n^{1+\frac{\log p}{\log q}}) O ( n 1 + l o g q l o g p ) , query time O ( n log p log q log n ) O(n^{\frac{\log p}{\log q}}\log n) O ( n l o g q l o g p log n ) and one-sided error < 0.5 <0.5 < 0.5
Hamming space: h ( x ) = x i h(x)=x_i h ( x ) = x i for uniform i ∈ [ d ] i\in[d] i ∈ [ d ] is a ( r , c r , 1 − r d , 1 − c r d ) (r,cr,1-\frac{r}{d},1-\frac{cr}{d}) ( r , cr , 1 − d r , 1 − d cr ) -LSH