Monday, August 1, 2016

Cayley-Hamilton Theorem Part III

In order to establish the Cayley-Hamilton theorem we first show the following result.
Proposition The set of diagonalizable linear transformations on finite dimensional complex space is dense in the set of all linear operators on that space.
Before we are able to demonstrate the above lemma, we first define a quotient space. Let \(V\) be a finite dimensional vector space, and let \(W\subset V\) be a subspace. We define an equivalence relation \(\equiv\) with \(v_1\equiv v_2\) if there is some \(w\in W\) such that \(v_1=v_2+w\). We then define \(V/W\) to be the set of equivalent classes given by this relation, and call it a quotient space.
We define addition and scalar multiplication as \([v_1]+[v_2]=[v_1+v_2],\, a[v]=[av]\), and it can be easily verified that these operations are well defined and will make \(V/W\) a vector space. By taking a basis of \(W\) and \(V/W\) it is also easily verified that \(\dim V/W+\dim W=\dim V\). Note that this means that if extend a basis of \(W\) to be a basis of \(V\) that the equivalence classes of the added on vectors will form a basis of \(V/W\)
Now let \(T\colon V\to V\) be a linear operator on \(V\) such that \(W\) is mapped to itself. We define an operator \(\tilde T\colon V/W\to V/W\) given by \(\tilde T[v]=[Tv]\). It is easily verified that \(\tilde T\) is linear. Furthermore, take a basis of \(V\) to be such that we first list a basis of \(W\) and then vectors not in \(W\). The matrix representation of \(T\) is then $$\begin{pmatrix}A&B\\0&C\end{pmatrix}$$ where \(A,C\) are square matrices of size \(\dim W\times \dim W\) and \(\dim V/W\times \dim V/W\) respectively. It is easily seen then that the matrix representation of \(\tilde T\) will be \(C\) as above.
Lemma Let \(T\colon \mathbb{C}^n\to \mathbb{C}^n\) be a linear operator. Then there exists some basis in which its matrix representation is upper-triangular.
Proof: We proceed by induction on dimension. The base case, where \(n=1\), is trivial. We now proceed to the inductive step.
Inductive step: Assume that any operator on \(n-1\) dimensional complex space may be represented as an upper-triangular matrix. Now, consider any linear operator \(T\colon \mathbb{C}^n\to \mathbb{C}^n\). By the fundamental theorem of algebra its characteristic polynomial must have at least one root, which implies the existence of at least one eigenvector with eigenvalue \(\lambda\). The span of this eigenvector will be a fixed space by the operator. Calling this space \(W\), we may look at our corresponding linear operator \(\tilde T\colon V/W\to V/W\). As we have just shown we may choose an appropriate basis of \(V\) and \(V/W\) such that $$T=\begin{pmatrix}\lambda&B\\0&C\end{pmatrix},\, \tilde T = C.$$ Since by our inductive assumption we may choose \(C\) to be an upper triangular matrix, this completes our proof.\(\blacksquare\)
We are now nearly done with this section. For any (upper) triangular matrix, the eigenvalues are given by the diagonal entries. This means that any upper triangular matrix with distinct diagonals are diagonalizable. Given any upper triangular matrix, we may then find a diagonalizable arbitrarily close by equating them on the off-diagonal entries and having the diagonal entries of this new matrix be close enough to the original matrix, yet distinct. We are able to do this as a matrix has finitely many diagonal components, and we have infinitely many choice for entries in our approximating matrix. Thus, given any linear operator on \(\mathbb{C}^n\) we may find an arbitrarily close diagonalizable one (in fact, one with distinct eigenvalues) by representing our operator as an upper triangular one, and finding a nearby diagonalizable matrix as above (which represents some linear transformation). Since the operator norm and matrix norms are equivalent as was shown in part I, our first proposition is proven. \(\blacksquare\)

Tuesday, January 19, 2016

Cayley-Hamilton Theorem II: The Arzela-Ascoli Theorem

In the previous post, we gave ourselves an incredibly useful measuring stick for vectors and linear operators. We now look at some general results regarding the convergence of sequences of functions, in order to build towards the Cayley-Hamilton Theorem.
Let \(\{f_i\}\) be a sequence of functions which map one metric space into another, say \(\mathbb{R}\to \mathbb{R}\) (the following definitions can be easily extended). Suppose that there is some function \(f\colon \mathbb{R}\to \mathbb{R}\) such that for each \(x\in \mathbb{R}\) we have \(f_i(x)\to f(x)\) then we say that \(\{f_i\}\) converges to \(f\) pointwise.
Now if given any \(\varepsilon>0\) there exists some integer \(N\) such that for all \(n\geq N\) we have \(|f(x)-f_n(x)|\) regardless of \(x\) then we say that \(\{f_i\}\) converges to \(f\) uniformly.

If the range of the functions is a complete metric space, then there is a useful criterion for determining if a sequence converges uniformly. Suppose that given any \(\varepsilon\gt 0\) there is some \(N\in \mathbb{N}\) for which if \(n\gt m\gt N\) we have \(|f_m(x)-f_n(x)|\lt \varepsilon\) for all \(x\). Then this sequence converges uniformly. We see this as follows. At each \(x\) we have a Cauchy sequence, which must converge to some value which we define as \(f(x)\). Now, given an \(\varepsilon\gt0\), we choose \(N\) large enough that \(|f_n(x)-f_m(x)|\lt \varepsilon/2\) for all \(n,m\gt N\). We then take \(N'\gt N\) such that we have \(|f(x)-f_{N'}|\lt \varepsilon/2\). Then for all \(n\gt N\) we have $$|f(x)-f_n(x)|\lt |f(x)-f_{N'}(x)|+|f_{N'}(x)-f_n(x)|\lt \varepsilon.$$ The distinction between types of convergence is important, as there are several properties of uniformly convergent functions which which do not necessarily hold for merely pointwise convergent ones. One such example is that a sequence of continuous, uniformly convergent functions will converge to a continuous function, while pointwise convergent functions may not.
Fix \(\varepsilon\gt0\) and take \(N\) large enough that \(|f(x)-f_N(x)|\lt\varepsilon/3\) for all \(x\). Take \(\delta\gt 0\) such that if \(|x-a|\lt\delta\) then \(|f_N(a)-f_N(x)|\lt\varepsilon/3\). It follows that $$|f(a)-f(x)|\leq |f(a)-f_N(a)|+|f_N(a)-f_N(x)|+|f(x)-f_N(x)|\lt\varepsilon,$$ so \(f\) is continuous at \(a\).
The classic counterexample for pointwise convergent functions is to consider the sequence \(f_n(x)=x^n\) on \([0,1]\)which will converge to $$f(x)=\begin{cases}0 \text{ for }x\in [0,1)\\ 1\text{ for } x=1\end{cases}.$$
Because of this property, it can be useful to obtain conditions as to when a collection of functions will have a subset which converges uniformly. The Arzela-Ascoli theorem gives us such conditions. To begin, we introduce a few more definitions.
Let \(\mathcal{F}\) be a collection of functions. We say that it is an equicontinuous family if the following holds: given any \(\varepsilon\gt 0\) there exists some \(\delta\gt 0\) such that for any \(f\in \mathcal{F}\) we have that \(|f(x)-f(y)|\lt \varepsilon\) whenever \(|x-y|\lt \delta\)
We say that \(\mathcal{F}\) is pointwise bounded if there is some function \(g\) such that for all \(f\in \mathcal{F}\) we have \(|f(x)|\lt g(x)\). We say that \(\mathcal{F}\) is uniformly bounded if there is some real number \(M\) such that \(|f(x)|\lt M\) for all \(f\in \mathcal{F}\)
Proposition: Let \(\{f_i\}\) be a sequence of uniformly bounded functions on some countable set \(X\). Then there is some subsequence \(\{f_{i_k}\}\) which converges (pointwise) on all \(x\in X\).
Proof: We order the elements of \(X\) as \(x_1,x_2,\cdots\). We note that \(\{f_i(x_1)\) is a bounded sequence and so there is some subsequence which we denote \(\{f_{1,n}\) for which \(f_{1,n}(x_1)\). Now inductively repeat a process where we take a subsequence of \(f_{k-1,n}\) which we denote \(f_{k,n}\), such that \(f_{k,n}(x_k)\) converges. We now write out the functions as follows $$\begin{matrix}f_{1,1}&f_{1,2}&f_{1,3}&\cdots\\f_{2,1}&f_{2,2}&f_{2,3}&\cdots\\f_{3,1}&f_{3,2}&f_{3,3}&\cdots\\\vdots&\vdots&\vdots&\ddots\end{matrix}$$ Since each following row is a subsequence of the previous rows, every function in preceeding rows appears in the row above, either directly above or to the right. It then follows that by taking the diagonal elements we see that \(\{f_{i,i}\}\) is a subsequence of each row after a finite number of terms, and therefore converges for all \(x_n\).∎
We have one more simple topology result, and then we are ready for the Ascoli-Azela theorem.
Lemma: Let \(K\) be a compact set. Then there exists an at most countable set which is dense in \(K\).
Proof: For each \(n\in \mathbb{N}\) we know that the set of all open balls of radius \(1/n\) must cover \(K\), and so there is a finite subset which will cover \(K\). Let \(A_n\) be the collection of the centers of such open balls. Then if we define $$A=\bigcup A_n$$ we have that \(A\) is at most countable as it is the countable union of finite sets, and will be dense in \(K\) as given any \(x\in K\) we can find a point in \(A\) within any distance \(x\).∎
Theorem [Arzela-Ascoli]: Let \(\mathcal{F}\) be a family of equicontinuous functions on a compact set \(K\). Furthermore, suppose that \(\mathcal{F}\) is pointwise bounded. Then \(\mathcal{F}\) is uniformly bounded, and has a subsequence \(\{f_{i_k}\}\) which is uniformly bounded.
Proof: Take \(\varepsilon\gt 0\) and select \(\delta\) such that if \(|x-y|\lt \delta\) then \(|f_j(x)-f_j(y)|\lt \varepsilon\) for all \(f_j\in \mathcal{F}\). Consider the set of all open balls of radius \(\delta\), and consider the finite subcover of \(K\). Let \(x_1,x_2,\cdots,x_n\) be the centers of these balls. Then we know that there are numbers \(M_1,M_2,\cdots,M_n\) such that \(|fjn(x_i)|\lt M_i\). Now for any \(x\in K\) we know that there is some \(x_i\) with \(|x_i-x|\lt \delta\). Thus we know that \(|f_j(x)|\lt M_i+\varepsilon\) for all \(f_j\in \mathcal{F}\). We then see that \(\mathcal{F}\) is uniformly bounded by \(\max M_i +\varepsilon\).
We now let \(A\subset K\) be a countable dense subset of \(K\), and take \(\{f_{i_k}\}\) to be a subsequence of \(\mathcal{F}\) which converges pointwise over all of \(A\). Now fix \(\varepsilon\gt 0\) and consider the same \(\delta\) as before, but such that \(|f_{i_k}(x)-f_{i_k}(y)|\lt \varepsilon/3\). If we consider the set of balls of radius \(\delta\) about centers which are elements of \(A\), we see that this covers \(K\). We now let \(\{x_1,x_2,\cdots,x_n\}=B\subset A\) be the centers of the finite subcover.
For any \(x\in K\) there is some \(x_i\in B\) with \(|x_i-x|\lt \delta\). Now take \(N\) large enough that for all \(x_i\in B\) and \(m,n\gt N\) we have \(|f_{i_n}(x_i)-f_{i_m}(x_i)|\lt\varepsilon/3\). It follows that $$|f_{i_n}(x)-f_{i_m}(x)|\leq |f_{i_n}(x)-f_{i_n}(x_i)|+|f_{i_n}(x_i)-f_{i_m}(x_i)|+|f_{i_m}(x_i)-f_{i_m}(x)|\lt \varepsilon$$ and so \(\{f_{i_k}\}\) converges uniformly.∎
We conclude this section by noting simply that on a compact set uniform convergence implies equicontinuity, and so equicontinuity is a necessary condition to show uniform convergence.

Monday, January 4, 2016

Cayley-Hamilton Theorem Part I: Norms

In this and a following blog post, I will explain and prove the Cayley-Hamilton theorem, a main algebraic result of linear operators. There are of course multiple proofs which can be found online, the vast majority of which approach it via algebraic means. My goal is to create an analytic proof to approach this problem.

In this post we deal exclusively with some facts about norms. These are common results which can be found in many textbooks.

Equivalence of norms

We first introduce the concept of a norm.
Let \(V\) be a vector space, and \(v_1,\,v_2\) be arbitrary elements of the space. A norm is a function \(f\colon V\to \mathbb{R}\) which satisfies the following three properites:
  1. \(f(v_1)\geq 0,\, f(v_1)=0\) if and only if \(v_1=\mathbf{0}\)
  2. \(af(v_1)=f(av_1)\)
  3. \(f(v_1+v_2)\leq f(v_1)+f(v_2)\).
The standard notation for the norm of a vector \(v\in V\) is \(\|v\|\).
We should note that a norm induces a metric given by \(d(v,u)=\|v-u\|\), and thus induces a topology on the vector space \(V\).
There are of course infinitely many norms we may use for any given vector space. It may be convenient to use different ones for different proofs. The following theorem will allow use any two interchangeably for most purposes.
Theorem: Let \(V\) be a finite-dimensional vector space and \(\|\cdot\|_1\) and \(\|\cdot\|_2\) be two norms on \(V\). Then there exists constants \(C_1,C_2\) with \(0\lt C_1\leq C_2\lt \infty\) such that for all \(\mathbf{v}\in V\) we have
$$C_1\|\mathbf{v}\|_2\leq \|\mathbf{v}\|_1\leq C_2\|\mathbf{v}\|_2.$$ We refer to this result as the equivalence of norms.
Proof: We first note that by the definition of two norms being equivalent, we merely need to show that one norm is equivalent to all others. We will define this norm as the following: Fix a basis of \(V\) given by \(\mathbf{e}_1,\cdots,\mathbf{e}_n\) and let \(v_1,v_2,\cdots,v_n\) be the coordinates of a vector \(\mathbf{v}\). Then we define the norm as $$\|\mathbf{v}\|_0=\max_{1\leq i\leq n}|v_i|.$$ This should be easily verified to fit the three properties above. We also note by property two of norms that if the result holds for a vector \(\mathbf{v}\) then it will hold for the vector \(a\mathbf{v}\). We then only consider the vectors in the set \(S_n=\{v\in V\mid\|v\|_0=1\}\). Here we use the subscript to denote that this is a set in an \(n\) dimensional vector space.
We first show that this set is compact under the topology given. We see that the set \(C_n=\{v\in V\mid \|v\|_0\leq 1\}=[-1,1]^n\supset S_n\) must be compact, as it is the product of finitely many compact sets. Now since it is clear that \(S_n\) must be closed it follows that it is compact.
Now let \(\|\cdot\|_1\) be any norm on \(V\). We will show that it is a continuous function on the topology induced by \(\|\cdot\|_0\).
Fix \(\varepsilon\gt 0\) and consider any \(x,y\in V\) such that $$\|x-y\|_0\lt \frac{\varepsilon}{n\max\limits_{1\leq i\leq n}\|\mathbf{e}_i\|_1}.$$ We have that $$\|x\|_1-\|y\|_1=\|(x-y)+y\|_1-\|y\|_1\leq \|x-y\|_1$$ and so \(|\|x\|_1-\|y\|_1|\leq \|x-y\|_1\). Now, f \(x_i\) and \(y_i\) are the \(i\)th components of \(x\) and \(y\) respectively then \begin{align}|\|x\|_1-\|y\|_1|\leq& \left\|\sum (x_i-y_i)\mathbf{e}_i\right\|_1\\ \leq &\sum \|(x_i-y_i)\mathbf{e}_i\|_1\\ \leq &n\left(\max_i|x_i-y_i|\right)\left(\max_i\|\mathbf{e}_i\|_1\right)\\ \leq &n\|x-y\|_0\max_i\|\mathbf{e}_i\|_1\\ \lt &\varepsilon.\end{align} From the extreme value theorem, this norm restricted to \(S_n\) will take on a minimum and maximum values \(C_1\) and \(C_2\) respectively. That is, for all \(v\in V\) we have that $$C_1\leq \|v\|_1\leq C_2.$$ Now we also know that $\|v\|_0=1$ and so $$C_1\|v\|_0\leq \|v\|\leq C_2\|v\|_0.$$ This completes our proof. ∎

We should note here that this result only holds for finite dimensional vector spaces. Consider for example the vector space \(\mathcal{C}[1,\infty)\) and the norms given by \begin{align}\|f\|_1&=\int_1^\infty |f|\\ \|f\|_2&=\left(\int_1^\infty f^2\right)^{1/2}.\end{align} We see that \(\|1/x\|_2=1\), while \(\|1/x\|_1=\infty\). We have one result left to show, which deals with operator norms.
If \(A\colon V\to V\) is a linear operator on a finite dimensional space, then the operator norm is given by $$\|A\|=\sup_{x\neq \mathbf{0}} \frac{|A(x)|}{|x|}$$ where \(|x|\) refers to a given norm of a vector \(x\in V\). For our purpose, we will use \(|x|=\|x\|_0\) as defined above.

Note that we may also write $$\|A\|=\sup_{x\in S_n} |A(x)|$$ where \(S_n\) is defined above.
We now turn to the following theorem which will help relate the norms of operators with those of their representative matrices.
Theorem: Let \(L\colon V\to V\) be a linear operator on the \(n\)-dimensional vector space \(V\), and let \(M\in \mathbb{R}^{n^2}\) be its matrix representation. Then there are constants \(C_1,C_2\) such that \(C_1\|M\|_0\leq \|L\|\leq C_2\|M\|_0\).
Proof: The \(i\)th column of \(M\) is given by \(L(\mathbf{e}_i)\). It follows then that $$\|A\|_0=\max_i \|L(\mathbf{e}_i)\|_0.$$ This shows that \(\|A\|_0\leq \|L\|\). Now if we take some \(x\) with \(\|x\|=1\) we see that \begin{align}\|L(x)\|_0=&\left\|\sum x_i L(\mathbf{e_i})\right\|_0\\ \leq &\|x\|_0\sum \|L(\mathbf{e_i})\|\\ \leq &\|x\|_0\max_{1\leq i\leq n}\left\{\sum_{j=1}^n |a_{ij}|\right\}\\ \leq &n\|M\|_0\|x\|_0\end{align} which implies that \(\|L\|\leq n\|M\|_0\). The proof is complete. ∎