Suppose you are given a multiset of real numbers . Given a number
, one can ask the question : Find the fraction of points in
that are less than or equal to
. Here is a simple randomized algorithm to solve this.
| 1. Choose 2. Find out the fraction 3. Output |
How good is this algorithm? Here is an attempt at an analysis. For a number denote by
the fraction of elements in
less than or equal to
. Suppose
. Suppose
. Then
of the elements were chosen from
elements and
of the elements were chosen from
. Hence the probability is
. Therefore
has a binomial distribution
. The expected value of
is
and the standard deviation is
. By the The probability that we become wrong by
is
. By using a modified form of the Hoeffding’s bound the probability is upper bounded by
. If
then how large should
be so that we are at most off by 1% of
with probability
? A simple calculation shows that if
then we are good. If The nice part about use of this bound is that it does not depend on the size of the underlying multiset
. However there is a dependance on
that is unknown in the first place! However if we assume
then we can get away with sampling a few thousand elements. Are there any better known algorithms? In randomized algorithms that produce
answers it is possible to increase the probability of success by repeating algorithm multiple times and then taking the majority of the output as the true answer. Is there a similar way of improving the overall accuracy of algorithms that produce one of many possible values such as the percentile algorithm?