Problems
- Checking Identity
- (bit-string )
- Communication cost
- checking distinctness
- Input:
- Determine whether (multiset equivalence)
- counting distinct elements
- Data: Stream Model, is unknown and elements is presented one at a time
- Input: a sequence
- Output: an estimation of
- Set Membership
- Data: a set of elements
- Input:
- Output:
- Frequency Estimation
- Data: a sequence of elements
- Input:
- Output: |
Fingerprint
- use Fingerprint to design One-sided-error Monte Carlo algorithm
- : function easy to compute and compare
- is small
Communication Protocol
- Theorem (Yao 1979): Any deterministic communication protocol computing EQ on two -bit strings costs bits of communication in the worst-case.
- Fingerprinting by PIT
- under , chosen uniformly at random
-
- communication cost:
- Fingerprinting by check sum
- Proof technique
- number of prime divisors of
- number of primes in :
- communication cost:
Checking distinctness
-
- Theorem (Lipton 1989):
- Proof technique
- space cost:
Hashing
Distinct Elements
- deterministic: space
- -estimator :
- : approximation error
- : confidence error
- UHA(uniform hash assumption)
- takes bits according to information theory
- SUHA(Simple UHA): A uniform random function is available and the computation of is efficient.
- Hashing Estimator
- Compute : independent values in
-
- length of a subinterval
- drawback: is too large
- Flajolet-Martin algorithm (1985)
- Space cost: bits
- 2010:
Sketch
- sketch: lossy representation of data and tolerate a bounded error in answering queries
Set Membership
- Dictionary data structure
- balanced tree: space, time
- perfect hashing: space, time
- Bloom filter (1970)
- are uniform and independent random hash function
- Data Structure of bits
- initially: all
- for each for
- Query: yes if
- time cost:
-
- : always correct
- :
Frequency Estimation
- Additive error:
- heavy hitters: elements with
- Count-min sketch (Cormode and Muthukrishnan, 2003)
- Data Structure: integer array
- initially:
- for each , ++
- Query:
- Time cost:
- Space cost:
- Proof
- Data Structure: integer array