Found this blog through Terry’s blog . This is just what I was looking for. Real tcs stuff from a professional.
Finding Percentiles
March 19, 2008Suppose you are given a multiset of real numbers . Given a number
, one can ask the question : Find the fraction of points in
that are less than or equal to
. Here is a simple randomized algorithm to solve this.
| 1. Choose 2. Find out the fraction 3. Output |
How good is this algorithm? Here is an attempt at an analysis. For a number denote by
the fraction of elements in
less than or equal to
. Suppose
. Suppose
. Then
of the elements were chosen from
elements and
of the elements were chosen from
. Hence the probability is
. Therefore
has a binomial distribution
. The expected value of
is
and the standard deviation is
. By the The probability that we become wrong by
is
. By using a modified form of the Hoeffding’s bound the probability is upper bounded by
. If
then how large should
be so that we are at most off by 1% of
with probability
? A simple calculation shows that if
then we are good. If The nice part about use of this bound is that it does not depend on the size of the underlying multiset
. However there is a dependance on
that is unknown in the first place! However if we assume
then we can get away with sampling a few thousand elements. Are there any better known algorithms? In randomized algorithms that produce
answers it is possible to increase the probability of success by repeating algorithm multiple times and then taking the majority of the output as the true answer. Is there a similar way of improving the overall accuracy of algorithms that produce one of many possible values such as the percentile algorithm?
Superconcentrators From Expanders
March 7, 2008Today I will be blogging about the construction of superconcentrators from expanders. This is directly from the expanders survey. I am just reviewing it here for my own learning. First some definitions.
Definition 1. A regular bipartitie graph
, with
and
is called a
if it saisfies the properties below. For a given set
of vertices, we denote the set of vertices to which some vertex in
is connected as
.
1. For every with
2. For every with
,
.
Definition 2. A superconcentrator is a graph with two given subsets
with
, such that for every
and
with
, the number of disjoint paths from
to
is at least
.
Superconcentrators with edges are interesting for various reasons which we do not go into here.
But we do give a construction of a superconcentrator with edges from magical graphs above. A simple probabilistic argument can show the following result.
Theorem There exists a constant such that for every
, such that
, there is a magical graph with
.
Here is the construction of a superconcentrator from magical graphs. Assume that we can construct a superconcentrator with edges for every
. The construction is recursive. First take two copies of the magical graphs
with
and
. Connect every vertex of
to the corresponding vertex of
and add edges between
and
so that the graph becomes a superconcentrator with the size of the input vertex set as
. We claim that the resulting graph is a superconcentrator with input vertex set of size
.
Identify the input vertex set as and the output vertex set as
. For every
it is true that
. Therefore by Halls Marriage Theorem there exists a perfect matching between vertices of
and
. Similarly there exists a perfect matching between vertices of
and
. Together with the edges of the superconcentrator between input set
and output set
there are at least
disjoint paths between
and
. It remains to handle the case of
. For
, there is a subset
of vertices with
such that the vertices corresponding to
in
are in
. Edges between such vertices contribute to
disjoint paths between
and
. The remaining
disjoint paths exist as proved earlier. Hence we have a superconcentrator with input, output vertex sets of size
. How many edges are there in this graph? For the base case of this recursion, we let a superconcentrator with input output sets of size
be the complete bipartite graph with
edges. The following recursion counts the edges. Let
denote the number of edges in the superconcentrator as per this construction with input, output sets of size
. Then
for
and
for
. It can be easily seen that
for
.
Computing Minimum Spanning Trees With Uncertainty
February 28, 2008In this post we will partially discuss this paper by Erlebach et. al. The classical MST problem is studied for graphs with edge and vertex uncertainties. In general a problem with uncertainties can be specified as a triple where
, the
for the problem consists of a set of data points
and
is a set of
where
. The function
defines the solution for a particular configuration
. The input of the problem is just the set of areas
. An algorithm can
any of the
and this determines the value of the data point
. The objective is to minimize the number of such updates and still be able to compute
. An algorithm is
update competitive if it can compute
with at most
updates where
is the minimum number of updates required by any algorithm.
Example: Given a weighted graph the areas
can be open intervals such that the edge weight
. The function
can then be a minimum spanning tree of
.
The algorithm in this paper gives a general strategy to solve problems with uncertainties. The algorithm is
update competitive where
is an upper bound on the size of a witness set
. A witness set
is a set of areas such that in order to verify the solution
at least one element of
needs to be updated. The witness algorithm is a kind of template for problems with uncertainties as it does not specify how a witness set is to be computed.
Question : In general is it always possible to compute a witness set in polynomial time?
The problem discussed in this paper is an extension of the concept of MST to graphs with uncertain edge weights. A MST of such a graph is said to be a MST in all possible realizations of edge weights, that satisfy the basic condition of the edge weight belonging to its assigned area. The paper provides an algorithm that is update competitive when the areas are open intervals or singletons. Furthermore it is proved that, there is no
update competitive algorithm in this case. When there are no restrictions on the areas at all, it is proved that there is no constant update algorithm for the MST problem. The paper also considers graphs with vertex position uncertainties [where vertices are points in an Euclidean space].
The algorithm looks a lot like Kruskals MST algorithm. The main crux it to find witness sets of size as edges are considered in their special sorted order. For details please refer to the paper.
Some comments and questions:
1) How about approximation algorithms for problems with uncertainties? The goal of an approximation algorithm would be to compute an approximate solution to problems with uncertainties with minimum updates.
2) In this model, there seems to be no restriction on the running time of the algorithm but the only importance seems to be given to the number of updates. Is there a model which combines the two meaningfully?
3) The notion of a MST that applies to all possible realizations of a graph with edge uncertainties seems too restrictive.
Posted by cstheory
Posted by cstheory
Posted by cstheory