Finding Percentiles

Suppose you are given a multiset of real numbers A=\{a_1,a_2,\cdots,a_n\}. Given a number b, one can ask the question : Find the fraction of points in A that are less than or equal to b. Here is a simple randomized algorithm to solve this.

1. Choose K elements of A uniformly at random. Sort the elements. Let B be the sorted multiset.
2. Find out the fraction g of points in the new multiset B that are less than or equal b.
3. Output g

How good is this algorithm? Here is an attempt at an analysis. For a number x denote by F(x,A) the fraction of elements in A less than or equal to x. Suppose F(b,A) = f. Suppose F(b,B) = g. Then g|B| = K of the elements were chosen from f|A| elements and (1-g)|B| = (|B| - K) of the elements were chosen from (1-f)|A|. Hence the probability is \binom{|B|}{K} \cdot K^f \cdot (|B|-K)^(1-f). Therefore K has a binomial distribution B(|B|,f). The expected value of K is f|B| and the standard deviation is \sqrt{|B|f(1-f)}. By the The probability that we become wrong by \delta is Pr[|g - f| > \delta f]. By using a modified form of the Hoeffding’s bound the probability is upper bounded by 2e^{-\frac{\delta^2 |B| f}{2} }. If f > 0 then how large should |B| be so that we are at most off by 1% of f with probability 0.99? A simple calculation shows that if |B| > 1060/f then we are good. If The nice part about use of this bound is that it does not depend on the size of the underlying multiset A. However there is a dependance on f that is unknown in the first place! However if we assume f \geq 0.1 then we can get away with sampling a few thousand elements. Are there any better known algorithms? In randomized algorithms that produce 0-1 answers it is possible to increase the probability of success by repeating algorithm multiple times and then taking the majority of the output as the true answer. Is there a similar way of improving the overall accuracy of algorithms that produce one of many possible values such as the percentile algorithm?


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: