## Bounding the Volume of Hamming Balls

August 13, 2010

In this post I will derive an oft used inequality in Theoretical computer science, using the Chernoff technique. The following function turns up in a few places in combinatorial geometry and other places.

$\displaystyle \begin{array}{rcl} \Phi_d(m) = \sum_{i=0}^d {m \choose i}. \end{array}$

Here we assume ${m \geq d}$ since otherwise the function is trivially seen to be equal to ${2^m}$. Here we will show that,

$\displaystyle \begin{array}{rcl} \Phi_d(m) \leq \left ( \frac{me}{d} \right )^d \end{array}$

Two easy interpretations of ${\Phi_d(m)}$ are as follows. First, it counts the number of points in the hamming cube ${{\bf H}^m = \{0,1\}^m}$ whose hamming distance from the origin is at most ${d}$. This interpretation also allows us to derive the nice identity,

$\displaystyle \begin{array}{rcl} \Phi_d(m) = \sum_{h=0}^d \sum_{l = 0}^h {s \choose l}{m - s \choose h - l}. \end{array}$

for any ${0 \leq s \leq m}$. In the above expression we obey the convention ${{p \choose q} = 0}$ when ${p < q}$. To see this, interpret the above sum as the distance from a fixed point ${p \in {\bf H}^m}$ whose distance from the origin is ${s}$. The ${h}$ is the distance from ${p}$ and the inner summation counts the number of points at distance ${h}$. Then noting that the hamming cube looks the same from every point so the number of points of distance at most ${d}$ is the same from the origin or from ${p}$. Hence the identity.

This definition has a nice geometric interpretation because ${\Phi_d(m)}$ is also the number of points from ${{\bf H}^n}$ inside any ball of radius ${d}$. The second interpretation is as follows. Consider the discrete probability space ${\Omega = \{ 0, 1 \}}$ with the measure ${p(\{0\}) = p(\{1\}) = 1/2}$. Let ${X_1, X_2, \dots, X_m}$ be ${m}$ independent random variables ${X_i : \Omega \rightarrow \{0,1\}}$ defined as ${X_i(y) = y}$ for ${y \in \Omega}$. Let ${X = \sum X_i}$. Then we have,

$\displaystyle \Pr\{ X_1 = a_1, X_2 = a_2, \dots, X_m = a_m \} = \frac{1}{2^m},$

for any ${a_i \in \{0,1\}}$. As such we have,

$\displaystyle \frac{\Phi_d(m)}{2^m} = \Pr\{ X \leq d\}$.

We will use this interpretation to derive our inequality.

Lemma: ${\Phi_d(m) \leq \left ( \frac{me}{d} \right )^d}$.

Proof: Our first observation is that we may take ${d \leq m/2}$. The function ${\left ( \frac{me}{d} \right )^d}$ is an increasing function of ${d}$. Denoting this by ${f(d)}$ and extending it to a function on ${[0,m]}$ we have by logarithmic differentiation,

$\displaystyle \begin{array}{rcl} \frac{f'(d)}{f(d)} = \ln \left ( \frac{m}{d} \right ) \geq 0 \end{array}$

Thus for ${d > m/2}$ we have ${f(d) \geq f(m/2) \geq (2e)^{m/2} > 2^m}$. On the other hand ${\Phi_d(m) \leq 2^m}$ trivially.

Thus assuming ${d}$ is ${0 \leq d \leq m/2}$. Now we have the following identity,

$\displaystyle \begin{array}{rcl} \sum_{i=0}^d {m \choose i} = \sum_{j=m-d}^m {m \choose j} = 2^m \Pr\{ X \geq m-d \}. \end{array}$

so it sufficies to upper bound ${\Pr \{ X \geq m - d \}}$. We now use the Chernoff technique. For any ${\lambda > 0}$,

$\displaystyle \begin{array}{rcl} \Phi_d(m) &=& 2^m \Pr\{ X \geq m - d \} \\\\ &=& 2^m \Pr \{ e^{\lambda X} \geq e^{\lambda(m - d)} \} \\\\ &\leq& 2^m \frac{{\bf E}(e^{\lambda X})}{e^{\lambda(m - d)}} ~~ \text{[By Markov's Inequality]} \\\\ &=& 2^m \frac{\prod_{i=1}^m {\bf E}(e^{\lambda X_i})}{e^{\lambda(m-d)}} ~~ \text{[By Independence of the} ~ X_i] \end{array}$

Now for any ${i}$ we have,

$\displaystyle {{\bf E}(e^{\lambda X_i}) = \frac{e^{\lambda} + 1}{2}}$

and so we have,

$\displaystyle \begin{array}{rcl} \Phi_d(m) \leq \frac{(e^{\lambda} + 1)^m}{e^{\lambda(m-d)}} \end{array}$

We try to optimize the above expression by finding the ${\lambda}$ which minimizes the expression above. Noting that ${m/2 \geq d}$ we have, by differentiation and observing the sign of the derivative, that the ${\lambda}$ minimizing the expression is given by,

$\displaystyle \begin{array}{rcl} e^{\lambda} = \frac{m}{d} - 1 \end{array}$

and substituing that in the estimate we have,

$\displaystyle \begin{array}{rcl} \Phi_d(m) &\leq& \frac{\left ( \frac{m}{d} \right )^m}{\left ( \frac{m-d}{d}\right )^{m-d}} \\ &=& \left ( 1 + \frac{d}{m-d} \right )^d \left ( \frac{m}{d} \right )^d \leq \left ( \frac{me}{d} \right )^d. \end{array}$

The last step follows from the simple fact that,

$\displaystyle \begin{array}{rcl} \left ( 1 + \frac{d}{m-d} \right )^{m - d} \leq \left ( e^{\frac{d}{m-d}} \right )^{m-d} = e^d. \end{array}$

$\Box$