## Projections onto a Convex Body

August 23, 2010

Given a point ${x} \in {{\bf R}^d}$ and an affine subspace ${H \neq {\bf R}^d}$, the projection of ${x}$ onto ${H}$ is the point ${P_H(x)}$ such that,

$\displaystyle \begin{array}{rcl} \vert \vert x - P_H(x) \vert \vert = \min \{ \vert \vert x y \vert \vert ~ \mid y ~ \in H \} \end{array} .$

One can obviously define other notions of projection but the above is probably the most commonly used in geometry. If ${x \notin H}$ then the projected point ${P_H(x)}$ is the unique point in ${H}$ such that the vector ${P_H(x) - x}$ is orthogonal to ${H}$. Also is well known that projecting any segment to an affine subspace can only shrink its length. The proofs of these facts are easy to see. But in fact these facts are just corollaries of the following two results.

 Given any nonempty closed convex set ${C \subseteq {\bf R}^d}$ and a point ${x \in {\bf R}^d}$, there is a unique point ${P_C(x) \in C}$ such that, $\displaystyle \begin{array}{rcl} \vert \vert x - P_C(x) \vert \vert = \min \{ \vert \vert x - y \vert \vert ~ \mid ~ y \in C\} \end{array} .$

and,

 Let ${C}$ be a nonempty closed convex set. The mapping ${P_C : {\bf R}^d \rightarrow C}$ is a contraction i.e. for any ${x,y \in {\bf R}^d}$ we have, $\displaystyle \begin{array}{rcl} \vert \vert P_C(x) - P_C(y) \vert \vert \leq \vert \vert x - y \vert \vert \end{array}$

Applying these results to affine spaces (which are nonempty closed convex sets) yields the results mentioned earlier. This projection that maps $x$ to $P_C(x)$ is known as the metric projection. The proofs of these facts are in order. Read the rest of this entry »

## Bounding the Volume of Hamming Balls

August 13, 2010

In this post I will derive an oft used inequality in Theoretical computer science, using the Chernoff technique. The following function turns up in a few places in combinatorial geometry and other places.

$\displaystyle \begin{array}{rcl} \Phi_d(m) = \sum_{i=0}^d {m \choose i}. \end{array}$

Here we assume ${m \geq d}$ since otherwise the function is trivially seen to be equal to ${2^m}$. Here we will show that,

$\displaystyle \begin{array}{rcl} \Phi_d(m) \leq \left ( \frac{me}{d} \right )^d \end{array}$

Two easy interpretations of ${\Phi_d(m)}$ are as follows. First, it counts the number of points in the hamming cube ${{\bf H}^m = \{0,1\}^m}$ whose hamming distance from the origin is at most ${d}$. This interpretation also allows us to derive the nice identity,

$\displaystyle \begin{array}{rcl} \Phi_d(m) = \sum_{h=0}^d \sum_{l = 0}^h {s \choose l}{m - s \choose h - l}. \end{array}$

for any ${0 \leq s \leq m}$. In the above expression we obey the convention ${{p \choose q} = 0}$ when ${p < q}$. To see this, interpret the above sum as the distance from a fixed point ${p \in {\bf H}^m}$ whose distance from the origin is ${s}$. The ${h}$ is the distance from ${p}$ and the inner summation counts the number of points at distance ${h}$. Then noting that the hamming cube looks the same from every point so the number of points of distance at most ${d}$ is the same from the origin or from ${p}$. Hence the identity.

This definition has a nice geometric interpretation because ${\Phi_d(m)}$ is also the number of points from ${{\bf H}^n}$ inside any ball of radius ${d}$. The second interpretation is as follows. Consider the discrete probability space ${\Omega = \{ 0, 1 \}}$ with the measure ${p(\{0\}) = p(\{1\}) = 1/2}$. Let ${X_1, X_2, \dots, X_m}$ be ${m}$ independent random variables ${X_i : \Omega \rightarrow \{0,1\}}$ defined as ${X_i(y) = y}$ for ${y \in \Omega}$. Let ${X = \sum X_i}$. Then we have,

$\displaystyle \Pr\{ X_1 = a_1, X_2 = a_2, \dots, X_m = a_m \} = \frac{1}{2^m},$

for any ${a_i \in \{0,1\}}$. As such we have,

$\displaystyle \frac{\Phi_d(m)}{2^m} = \Pr\{ X \leq d\}$.

We will use this interpretation to derive our inequality.

Lemma: ${\Phi_d(m) \leq \left ( \frac{me}{d} \right )^d}$.

Proof: Our first observation is that we may take ${d \leq m/2}$. The function ${\left ( \frac{me}{d} \right )^d}$ is an increasing function of ${d}$. Denoting this by ${f(d)}$ and extending it to a function on ${[0,m]}$ we have by logarithmic differentiation,

$\displaystyle \begin{array}{rcl} \frac{f'(d)}{f(d)} = \ln \left ( \frac{m}{d} \right ) \geq 0 \end{array}$

Thus for ${d > m/2}$ we have ${f(d) \geq f(m/2) \geq (2e)^{m/2} > 2^m}$. On the other hand ${\Phi_d(m) \leq 2^m}$ trivially.

Thus assuming ${d}$ is ${0 \leq d \leq m/2}$. Now we have the following identity,

$\displaystyle \begin{array}{rcl} \sum_{i=0}^d {m \choose i} = \sum_{j=m-d}^m {m \choose j} = 2^m \Pr\{ X \geq m-d \}. \end{array}$

so it sufficies to upper bound ${\Pr \{ X \geq m - d \}}$. We now use the Chernoff technique. For any ${\lambda > 0}$,

$\displaystyle \begin{array}{rcl} \Phi_d(m) &=& 2^m \Pr\{ X \geq m - d \} \\\\ &=& 2^m \Pr \{ e^{\lambda X} \geq e^{\lambda(m - d)} \} \\\\ &\leq& 2^m \frac{{\bf E}(e^{\lambda X})}{e^{\lambda(m - d)}} ~~ \text{[By Markov's Inequality]} \\\\ &=& 2^m \frac{\prod_{i=1}^m {\bf E}(e^{\lambda X_i})}{e^{\lambda(m-d)}} ~~ \text{[By Independence of the} ~ X_i] \end{array}$

Now for any ${i}$ we have,

$\displaystyle {{\bf E}(e^{\lambda X_i}) = \frac{e^{\lambda} + 1}{2}}$

and so we have,

$\displaystyle \begin{array}{rcl} \Phi_d(m) \leq \frac{(e^{\lambda} + 1)^m}{e^{\lambda(m-d)}} \end{array}$

We try to optimize the above expression by finding the ${\lambda}$ which minimizes the expression above. Noting that ${m/2 \geq d}$ we have, by differentiation and observing the sign of the derivative, that the ${\lambda}$ minimizing the expression is given by,

$\displaystyle \begin{array}{rcl} e^{\lambda} = \frac{m}{d} - 1 \end{array}$

and substituing that in the estimate we have,

$\displaystyle \begin{array}{rcl} \Phi_d(m) &\leq& \frac{\left ( \frac{m}{d} \right )^m}{\left ( \frac{m-d}{d}\right )^{m-d}} \\ &=& \left ( 1 + \frac{d}{m-d} \right )^d \left ( \frac{m}{d} \right )^d \leq \left ( \frac{me}{d} \right )^d. \end{array}$

The last step follows from the simple fact that,

$\displaystyle \begin{array}{rcl} \left ( 1 + \frac{d}{m-d} \right )^{m - d} \leq \left ( e^{\frac{d}{m-d}} \right )^{m-d} = e^d. \end{array}$

$\Box$

## Jung’s Theorem

August 7, 2010

Given a set of points of diameter ${D}$ in ${n}$ dimensional Euclidean space ${{\bf R}^n}$ it is trivial to see that it can be covered by a ball of radius ${D}$. But the following theorem by Jung improves the result by a factor of about ${\frac{1}{\sqrt{2}}}$, and is the best possible.
Theorem 1 [ Jung’s Theorem: ] Let ${S}$ be a set of points in ${{\bf R}^n}$ of diameter ${D}$. Then there is a ball of radius ${\sqrt{\frac{n}{2n + 2}} D}$ covering ${S}$.
Proof: We first prove this theorem for sets of points ${S}$ with ${\vert S \vert \leq n + 1}$ and then extend it to an arbitrary point set. If ${\vert S \vert \leq n + 1}$ then the smallest ball enclosing ${S}$ exists. We assume that its center is the origin. Denote its radius by ${r}$. Denote by ${S' \subseteq S}$ the subset of points such that ${||p|| = r}$ for ${p \in S'}$. It is easy to see that ${S'}$ is in fact non empty.

Observation: The origin must lie in the convex hull of ${S'}$. Assuming the contrary, there is a separating hyperplane ${H}$ such that ${S'}$ lies on one side and the origin lies on the other side of ${H}$ (strictly). By assumption, every point in ${S \setminus S'}$ is distance strictly less than ${r}$ from the origin. Move the center of the ball slightly from the origin, in a direction perpendicular to the hyperplane ${H}$ towards ${H}$ such that the distances from the origin to every point in ${S \setminus S'}$ remains less than ${r}$. However, now the distance to every point of ${S'}$ is decreased and so we will have a ball of radius strictly less than ${r}$ enclosing ${S}$ which is a contradiction to the minimality of ${r}$.

Let ${S' = \{ p_1, p_2,\dots, p_m\}}$ where ${m \leq n \leq d+1}$ and because the origin is in the convex hull of ${S'}$ so we have nonnegative ${\lambda_i}$ such that,

$\displaystyle \sum \lambda_i p_i = 0, \sum \lambda_i = 1$

Fix a ${k, 1 \leq k \leq m}$. Then we have,
$\displaystyle \begin{array}{rcl} 1 - \lambda_k &=& \sum_{i \neq k} \lambda_i \\ &\geq& \frac{1}{D^2}\sum_{i = 1}^m \lambda_i || p_i -p_k ||^2 \\ &=& \frac{1}{D^2}\left ( \sum_{i=1}^m \lambda_i (2r^2 - 2 \langle p_i, p_k \rangle) \right ) \\ &=& \frac{1}{D^2} \left ( 2r^2 - 2 \left \langle \sum_{i=1}^m \lambda_i p_i, p_k \right \rangle \right ) \\ &=& \frac{2r^2}{D^2} \end{array}$

Adding up the above inequalities for all values of ${k}$, we get
$\displaystyle \begin{array}{rcl} m - 1 \geq 2m r^2 / D^2 \end{array}$

Thus we get ${\frac{r^2}{D^2} \leq m - 1 / 2m \leq \frac{n}{2n + 2}}$ since ${m \leq n + 1}$ and the function $(x-1)/2x$ is monotonic. So we have immediately ${r \leq \sqrt{\frac{n}{2n+2}} D}$.
The remainder of the proof uses the beautiful theorem of Helly. So assume ${S}$ is any set of points of diameter ${D}$. With each point as center draw a ball of radius ${r = \sqrt{\frac{n}{2n+2}}D}$. Clearly any ${n + 1}$ of these balls intersect. This is true because the center of the smallest ball enclosing ${n+1}$ of the points is at most ${r}$ away from each of those points. So we have a collection of compact convex sets, any ${n+1}$ of which have a nonempty intersection. By Helly’s theorem all of them have an intersection. Any point of this intersection can be chosen to be the center of a ball of radius ${r}$ that will enclose all of ${S}$. $\Box$

## An IMO problem

July 20, 2010

I spent this weekend working on and off a very interesting IMO problem – problem 6 from IMO 2006. You can find that here . It was a real hard nut and only led to a crushed ego :(. Let me first describe the problem and a result I managed to prove in this direction, which was however much weaker than the problem statement.

IMO 2006 Problem 6 : Assign to each side $b$ of a convex polygon $P$ the maximum area of a triangle that has $b$ as a side and is contained in $P$. Show that the sum of the areas assigned to the sides of $P$ is at least twice the area of $P$.

I managed to prove the following result.

Theorem: Assign to each side $b$ of a convex polygon $P$ the maximum area of a triangle that has $b$ as a side and is contained in $P$. Then the sum of the areas assigned to the sides of $P$ is at least $\frac{3}{2}$ the area of $P$.

The proof is as follows. We prove first a small lemma.

Lemma 1 : Let $P$ be a convex polygon with $n \geq 4$ sides. Then there is a quadrilateral or a triangle with vertices among the vertices of $P$ that has area at least $\frac{1}{2}$ the area of $P$.

Proof: Let the polygon have vertices $A_1, A_2, \dots, A_n$. Without loss of generality assume that $A_1 A_k$ is the longest diagonal. The angles $\angle A_k A_1 A_2, \angle A_k A_1 A_n, \angle A_1 A_k A_{k-1}, \angle A_1 A_k A_{k+1}$ are all less than $\pi/2$ since $A_1 A_k$ is the longest diagonal. See the figure below for an example. For any vertex $A_i$ denote by $h(A_i)$ the distance from the line $A_1 A_k$. We will call the number $h(A_i)$ the height of $A_i$.

If $k = 2$ or $k = n$ then let $A_q$ be the vertex with maximum height. By our earlier observation regarding the angles at vertices $A_1, A_k$ with the line $A_1 A_k$, $P$ is enclosed inside the rectangle with base $A_1 A_k$ and height $h(A_q)$. As such the area of the triangle $A_1 A_k A_q$ is at least $\frac{1}{2}$ the area of $P$.

If $2 < k < n$ consider the vertex $A_p$ that has the maximum height among all vertices $A_i, 1 < i < k$ and the vertex $A_q$ that has maximum height among all vertices $A_i, k < i \leq n$. Clearly as before $P$ is enclosed inside the rectangle $R$ with a side parallel to $A_1 A_k$ through $A_p, A_q$ and the other sides through $A_1$ and $A_k$. Now

$\text{area}(A_1 A_p A_k A_q) = \text{area}(A_1 A_p A_k) + \text{area}(A_1 A_k A_q) \geq \frac{1}{2} \text{area}(R)$

We can now prove the theorem as follows.

Proof of Theorem: Let $A_1 A_p A_k A_q$ be a maximum area quadrilateral whose vertices are also vertices of $P$. We will call this quadrilateral $Q$. Clearly $\text{area}(Q) \geq \frac{1}{2} \text{area}(P)$ as we saw in the
previous lemma. The lemma actually showed that in few cases there
is actually a triangle $T$ with vertices among those of $P$ with area at least $\frac{1}{2}$ of $P$. We will not deal with the case of a triangle here since it is similar but we remark that if there is such a triangle, the theorem actually gets stronger and we can in fact prove what was required in the IMO problem.

Consider the triangles in the following four sets

$\left \{ A_1 A_2 A_q, A_2 A_3 A_q, \dots, A_{p-1} A_p A_q \right \}$

$\left \{ A_p A_{p+1} A_q, \dots, A_{k-1} A_k A_q \right \}$

$\left \{ A_k A_{k+1} A_p, \dots, A_{q-1} A_q A_p \right \}$

$\left \{ A_q A_{q+1} A_p, \dots, A_n A_1 A_p \right \}$

The sum of the areas of the triangles is at most the sum of the numbers on the edges $b$ of $P$. However, the sum of the areas is easily seen to be at least

$\text{area}(P) + \text{area}(Q) \geq \frac{3}{2} \text{area}(P)$

and we are done.

I found the solution to the IMO problem here and it is a real gem. It was a relief to see the solution!

## Some pointers for computer troublebombarding …

March 6, 2010

1) QTParted – free software for resizing partitions (it can handle NTFS partitions) found in Knoppix Free CD.
3) Broken exe associations – sometimes you may find all your shortcuts are behaving weird. You double click them and it asks you to indicate the program with which to open it. Well sometimes running them as Administrator works but this is a temporary solution. It may happen that even regedit or msconfig is not found when you type it at start! But dont panic – go directly to their location – search in C:\WINDOWS and right click on them and click “Run as Administrator”. So you can open the cmd prompt this way (CMD.EXE) and run a command like regedit or msconfig. Then to fix the broken exe associations you need to slightly tweak the registry. Read about it here – http://filext.com/faq/broken_exe_association.php.

It is good to have these handy – keep a copy of the softwares on a CD and a printout of the document. They can save hours of your time (and lot of sanity) sometimes.

## Illinois tollways

February 23, 2010

So you missed paying toll – you did not have enough cash or you just zoomed in through the IPass lane. If that happened on the Golden Gate Bridge , you have a toll violation and you will end up paying a hefty fine. I remember back in 2008 I missed paying the toll because I was like 10c short and I had to pay 25$. No checks, credit cards, invitation cards, greeting cards or poker face appeals worked. That really sucked! However Illinois has a great tollway system. Even if you speed past the toll booth for any number of reasons (including being in a hurry) you have upto 7 days to pay it online at http://www.illinoistollway.com/ – you just need to remember the approximate time, date and location of missed toll. There is a map on the website that one can use to figure out the missed booth you may need to spend 10 minutes on this but this is way more fair than paying 25$ fines! Go Illinois!

## Hello again!

February 6, 2010

It has been one and a half years since I made a post. The last post was on the tensor Calculus. While I have still not dropped plans to complete the series at some point of time, I think it is simply unsustainable for me right now. I have pondered once in a while at the reason why I stopped blogging and here are some possible reasons.

1. My aim was to make very expository posts. This is hard work and requires lot of motivation to keep up amidst all going on in life. Thus having this razor sharp focus on expository posts was not a good idea for my type of person. The moments of inspiration are few when I want to make such posts while a blog is something that should keep rolling.

2. I tend to plan quiet a lot before writing. This perfectionist attitude did not work well for me.

3. For sure I lack time management skills. But thats ok, I guess I am getting better.

I am not going to make any promises this time. Just try to write something whenever I feel like it … So once again – Hello everybody!

## Tensor Calculus II

July 10, 2008

In the last post I described a contravariant tensor of rank 1 and a covariant tensor of rank 1. In this post we will consider generalizations of these. We will introduce tensors of arbitrary rank $(k,l)$ where $k$ is the number of contravariant indices and $l$ is the number of covariant indices. How many numbers does such a tensor represent? It is easy to see that if the tensor is defined in $n$ dimensional space, it defines $n^{(k+l)}$ real numbers for each point of space, and each coordinate system. The notation for such a tensor is

$A^{i_1 i_2 \dots i_k}_{j_1 j_2 \dots j_l}$

Now before we go ahead, I would like to clarify that the above represents just 1 single component of the tensor , out of the $n^{(k+l)}$ components. The reason why I clarified this is that with tensors and the Einstein summation convention introduced in the last post, indices can sometimes be used as a summand. Specifically we said that if an index repeats in a tensor expression in a contravariant postion and a covariant position it stands for summation. You need to sum over that index for all possible values (1 to $n$). Also, multiple indices can repeat in a tensor expression. In that case one needs to sum over all such indices varying from 1 to $n$. Therefore a tensor expression in which 1 index is to be summed upon expands into $n$ quantities and a tensor equation in which 2 indicies are to be summed upon expands into a sum of $n^2$ quantities. Guess what? armed with this understanding we are already ready to understand the general transformation law of tensors. Despair not my friends looking at this expression. If you do not understand this – dont worry. We will probably never see such complicated mixed tensors of general rank $(k,l)$. However, it is necessary to understand how to compute with tensors and their transformation law. So after reading the general law of transformation and understanding what it means, I woul like you to forget it temporarily instead of getting bogged down by the weird equation! I will follow the definition with an example. This will be restricted to 3 dimensions and to our familiar spherical polar and cartesian coordinate systems. If you can understand the stuff in there, I would like you to remember the general computational method there. The general law of tensor transformation for a tensor of rank $(k,l)$ is

$\bar{A}^{i_1 i_2 \dots i_k}_{j_1 j_2 \dots j_l} = A^{u_1 u_2 \dots u_k}_{v_1 v_2 \dots v_l}\frac{\partial{\bar{x}^{i_1}}}{\partial{x^{u_1}}} \frac{\partial{\bar{x}^{i_2}}}{\partial{x^{u_2}}} \dots \frac{\partial{\bar{x}^{i_k}}}{\partial{x^{u_k}}} \cdot \frac{\partial{x^{v_1}}}{\partial{\bar{x}^{j_1}}} \frac{\partial{x^{v_2}}}{\partial{\bar{x}^{j_2}}} \dots \frac{\partial{x^{v_l}}}{\partial{\bar{x}^{j_l}}}$

In this equation, the left hand side represents the value of 1 component out of the $n^{(k+l)}$ components in the “bar” coordinate system [which is why we denote the $A$ with a bar as the tensor]. The right hand side on the other hand is a huge sum of $n^{(k+l)}$ quantities because of the repeating indices $u_1, u_2, \dots, u_k, v_1, v_2, \dots, v_l$. Each term of the summation is a product of a component of the tensor with the appropriate partial derivatives – that is, each term is a particular instantiation of the $u_1, u_2, \dots, u_k, v_1, v_2, \dots, v_l$. The transformation law for the general rank tensor is a direct generalization of the transformation law for rank 1 tensors. The tensors value is needed for the $i_1,i_2,\dots,i_k,j_1,j_2,\dots,j_l$ index and the corresponding terms for the primed coordinate system occur in the numerator or denominator according to its for the $i$‘s – the contraviant indices or the $j$‘s – the covariant indices. The rest of the new introduced $u_s, v_t$ have been summed upon since they are repeating indices.

Wow! We have stated the general transformation law and now we will proceed to an example. Before we do that, I would like to state our general roadmap in the future posts. Remember that our goal is to understand the equations of General Relativity. We are studying tensors just because we want to understand the weird symbols and equations in the profound equations laid down by Einstein. So in the next post, we will study general operations on tensors like – addition, subtraction, inner products etc. Along with this we will state some rules to recognize tensors. We will state a general rule by which we can ascertain that a set of number form the components of a tensor. We will use this general rule to prove that some collection of numbers are tensors in later posts. Here we will not however talk of the derivative of a tensor because that involves some more machinery – but hey .. without derivatives there is no calculus so get back we will! The next few posts will talk about the fundamental tensor which is related to measuring distance in space between two points. That will define define distance for us. In the next few posts we will then discuss derivatives of tensors. In doing so we will introduce some tensors in terms of the fundamental tensor. These will be helpful for defining derivatives. Armed with all this machinery about distances and derivatives, we will then state equations for geodesics which are shortest paths between 2 points in space. In ordinary space this is the straight line. But in spaces, where the fundamental tensor is more complicated the geodesics are not “straight” lines [Well by then we will probably wonder what straight means anyway!]. Finally we will have the notions of distance, derivatives and shortest paths in our bag so we will talk about what we really need to understand the Einstein equations – curvature. Specifically we will introduce quantities which represent how curved a space is at a point. After this we will state the Einstein equations and if I can manage I will show some simple consequences of these equations. The Einstein equations somehow relate the curvature of our playground [which is the “space” of 4 dimensional spacetime], to the distribution of mass and energy.

So much for the future posts – all that is probably going to take several posts! But for the time being lets get back to an example using familiar 3 space. In familiar 3 space, we are going to work with 2 coordinate systems – one : cartesian $x^1=x, x^2=y, x^3=z$ and the other spherical polar $\bar{x}^1=r, \bar{x}^2=\theta, \bar{x}^3 = \phi$. Recall that for a point $P$, $r$ is the distance from the origin $O$, $0 \leq \phi \leq \pi$ is the angle made by $OP$ with the $z$ axis and $0 \leq \theta < 2\pi$ is the angle made by the projection of $OP$ onto the $xy$-plane with the $x$ axis. Then it is easy to see that

$x^1 = \bar{x}^1 \sin{\bar{x}^3} \cos{\bar{x}^2}$

$x^2=\bar{x}^1 \sin{\bar{x}^3} \sin{\bar{x}^2}$

$x^3=\bar{x}^1 \cos{\bar{x}^3}$.

The inverse transformations equations are

$\bar{x}^1 = \sqrt{ {x^1}^2 + {x^2}^2 + {x^3}^2}$

$\bar{x}^2=\tan^{-1}\left( \frac{x^2}{x^1}\right)$

$\bar{x}^3=\cos^{-1}\left( \frac{x^3}{\sqrt{ {x^1}^2 + {x^2}^2 + {x^3}^2}}\right)$

See here for an illustration. Assume we are working in a small region of space where all the $x^1,x^2,x^3$ are nonzero so that $\bar{x}^2,\bar{x}^3$ are well defined. We also need to verify that the Jacobian matrix of the partial derivatives is non-singular but that is left as an exercise!

We will take a simple case here.

Example

Suppose there is a contravariant tensor that has components $\bar{A}^1, \bar{A}^2, \bar{A}^3$. What are the components in the first coordinate system? For illustration we only show $A^2$. Now by the tensor transformation law

$A^2 = \bar{A}^i \frac{\partial{x^2}}{\partial{\bar{x}^i}}$

and so evaluating the partial derivatives and summing them up $A^2$ evaluates to

$A^2=\bar{A}^1\sin{\bar{x}^2}\sin{\bar{x}^3} + \bar{A}^2\bar{x}^1\cos{\bar{x}^2}\sin{\bar{x}^3}+\bar{A}^3\sin{\bar{x}^2}\cos{\bar{x}^3}$

Of course the above represents the value at a specific point in space. A contravariant tensor of rank 2 will have 9 such terms to sum up. We encourage the reader to evaluate the components of such a tensor.

Before the next post do think of some of the things you can do with tensors. Specifically think about

1. When can two tensors be added/subtracted to produce new tensors?
2. Can two tensors be multiplied to produce a new tensor?
3. Consider a function $f$ defined for each point of space. Suppose for each coordinate system $(x^1,x^2,\dots, x^n)$ we define $n$ components at a point by $A_i= \frac{\partial{f}}{\partial{x^i}}$. Prove that these numbers define a covariant tensor of rank 1.

## Tensor Calculus – Part 1

May 12, 2008

Firstly, I am sorry for the outage in posts. Could not find time to write up anything worth a really good post. I have received some verbal requests for a new post. I appreciate all such requests! So without further ado, let me get down to business.

Today’s post will be about the Tensor Calculus . In fact I intend to write much more about them in subsequent posts but we will see how that goes. But before I go to give an introduction, let me state my motivation in studying tensors. While some familiarity with Physics is needed to understand the next paragraph, the tensor calculus is a purely mathematical topic.

The motivation comes from my long standing wish to be able to understand the General Theory of Relativity . The GTR is a geometrical theory of gravity. It explains gravity as a result of the deformation of SpaceTime . Mass and Energy cause SpaceTime to be distorted and curved. This distortion in turn acts back on mass to determine its motion. The perceived effects of gravitation on a particle are a direct result of the motion in the distorted SpaceTime without any mysterious force acting on the particle. That is quiet as much as I know about the GTR for now. But the basic mathematical tool to study the GTR is Reimannian Geometry . Reimannian Geometry is the appropriate generalization, for a general $n-$dimensional space, to study things like curves, surfaces, curvature, so common from everyday experience in two and three dimensional spaces. Many basic entities in Reimannian Geometry are best described by Tensors. Tensors are also used in many other places and in order to understand current physical theories and their mathematical formulations, one needs to know about tensors and basic differential geometry.

So, now lets get down to an introduction. Consider an $n-$ dimensional space $V_n$. A coordinate system is an assignment of an $n$ tuple of numbers $x^i, i = 1 \cdots n$ to each point of $V_n$. In a different coordinate system the same point will have a different set of coordinates $\bar{x}^i, i = 1 \cdots n$. We also assume that the $\bar{x}^i$ are given by $n$ functions $\varphi_i(x^j)$ of the $x^i$. The functions are assumed to be such that the Jacobian matrix $\frac{\partial{\bar{x}^i}}{\partial{x^j}}$ is invertible.

Example. In 3 dimensional space, a standard cartesian coordinate system can be the first coordinate system, while the polar coordinates $(r,\theta,\phi)$ can be the primed coordinate system. Or the primed coordinate system can be one with the 3 axes not necessarily at right angles to each other.

Definition. A contravariant tensor of rank 1 is a set of $n$ functions $A^i$ of the coordinates such that they transform according to the following law

$\bar{A}^i = A^j \frac{\partial{\bar{x}^i}}{\partial{x^j}}$

Now in the above expression you might notice that that the index $j$ is repeated in the right hand side of the equation. That implies a summation . Therefore the expression actually means the following sum

$\sum^n_{j=1} A^j \frac{\partial{\bar{x}^i}}{\partial{x^j}}$.

In general, in writing tensorial expressions, the above so called summation convention is widely used. Whenever an index is used as a superscript and the same occurs as a subscript, or if it occurs in the numerator and the denominator it is to be summed on over all the coordinates.

The above definition was for a contravariant tensor of rank 1. In fact a contravariant tensor of rank 1 is just a vector. The tensorial transformation law just generalizes the transformation of vectors. To see this consider 3 dimensional space and basis vectors $e_1, e_2, e_3$. Suppose a vector $v = a_1 e_1 + a_2 e_2 + a_3 e_3$. Consider a different coordinate system with basis vectors $e'_i = \sum_j a_ij e_j$. Now in this coordinate system let $v = a'_1 e'_1 + a'_2 e'_2 + a'_3 e'_3$. Now what are the $a'_i$ as functions of the $a_i$ ? A quick calculation shows that $a' = B a$ where $a'$ is the row vector of coordinates $a'_i$, $a$ is the row vector of coordinates $a_i$ and $B$ is the inverse of matrix $A = (a_{ij})$. Also as one can verify $b_{ij} = \frac{\partial{\bar{x}^i}}{\partial{x^j}}$.

Example. Consider the differentials $dx^i$. Clearly the differentials in the primed coordinate system are given by

$d\bar{x}^i= dx^j \frac{\partial{\bar{x}^i}}{\partial{x^j}}$

Next, we define a covariant tensor of rank 1 .

Definition. A covariant tensor of rank 1 is a set of $n$ functions $A_i$ that transform according to

$\bar{A}_i = A_j \frac{\partial{x^j}}{\partial{\bar{x}^i}}$.

In the next post we will go to the definition of more general tensors called mixed tensors. Such tensors are contravariant or covariant in more than one index or have mix of such indices. We will see the transformation law of such tensors. We will also start with some simple operations on tensors.

## Math blogs vs CS blogs

March 31, 2008

I am very impressed by many professional mathematicians posting expository articles on their blog. One of the great advantages it has over just a survey article is that the comments section allows direct interaction with the author. Also, in many cases the presentation is more relaxed than a research paper and more fast paced than a book. I have examples of at least three blogs : Terry Tao’s blog , Rigorous Trivialities and The Unapologetic Mathematician , with regular detailed postings on various topics. I am wondering why there arent so many CS theory blogs? Luca’s blog is a great one that stands out in this regard. There has a been a lot of discussion about selling CS theory, see for example this post by Lance Fortnow. I think that one of the ways to do this is to write good expository articles and popularize various ideas that arise again and again in theoretical CS.