Suppose you are given a multiset of real numbers . Given a number , one can ask the question : Find the fraction of points in that are less than or equal to . Here is a simple randomized algorithm to solve this.
|1. Choose elements of uniformly at random. Sort the elements. Let be the sorted multiset.
2. Find out the fraction of points in the new multiset that are less than or equal .
How good is this algorithm? Here is an attempt at an analysis. For a number denote by the fraction of elements in less than or equal to . Suppose . Suppose . Then of the elements were chosen from elements and of the elements were chosen from . Hence the probability is . Therefore has a binomial distribution . The expected value of is and the standard deviation is . By the The probability that we become wrong by is . By using a modified form of the Hoeffding’s bound the probability is upper bounded by . If then how large should be so that we are at most off by 1% of with probability ? A simple calculation shows that if then we are good. If The nice part about use of this bound is that it does not depend on the size of the underlying multiset . However there is a dependance on that is unknown in the first place! However if we assume then we can get away with sampling a few thousand elements. Are there any better known algorithms? In randomized algorithms that produce answers it is possible to increase the probability of success by repeating algorithm multiple times and then taking the majority of the output as the true answer. Is there a similar way of improving the overall accuracy of algorithms that produce one of many possible values such as the percentile algorithm?