Linear Algebra Written ExaminationsStudy GuideEduardo CoronaOther Authors as they Join InNovember 2, 2008Contents1 Vector Spaces and Matrix Operations22 Linear Operators23 Diagonalizable Operators3.1 The Rayleigh Quotient and the Min-Max Theorem . . . . . . . .3.2 Gershgorin’s Discs Theorem . . . . . . . . . . . . . . . . . . . . .3444 Hilbert Space Theory: Interior Product, Orthogonal Projectionand Adjoint Operators54.1 Orthogonal Projection . . . . . . . . . . . . . . . . . . . . . . . .74.2 The Gram-Schmidt Process and QR Factorization . . . . . . . .84.3 Riesz Representation Theorem and The Adjoint Operator . . . .95 Normal and Self-Adjoint Operators: Spectral Theorems andRelated Results115.1 Unitary Operators . . . . . . . . . . . . . . . . . . . . . . . . . . 145.2 Positive Operators and Square Roots . . . . . . . . . . . . . . . . 156 Singular Value Decomposition and the Moore-Penrose Generalized Inverse6.1 Singular Value Decomposition . . . . . . . . . . . . . . . . . . . .6.2 The Moore-Penrose Generalized Inverse . . . . . . . . . . . . . .6.3 The Polar Decomposition . . . . . . . . . . . . . . . . . . . . . .161618197 Matrix Norms and Low Rank Approximation197.1 The Frobenius Norm . . . . . . . . . . . . . . . . . . . . . . . . 197.2 Operator Norms . . . . . . . . . . . . . . . . . . . . . . . . . . . 207.3 Low Rank Matrix Approximation: . . . . . . . . . . . . . . . . . 211

8 Generalized Eigenvalues, the Jordan Canonical Form and eA 228.1 The Generalized Eigenspace K . . . . . . . . . . . . . . . . . . . 238.2 A method to compute the Jordan Form: The points diagram . . 248.3 Applications: Matrix Powers and Power Series . . . . . . . . . . . 249 Nilpotent Operators2410 Other Important Matrix Factorizations2411 Other Topics (which appear in past exams)2412 Yet more Topics I can think of251Vector Spaces and Matrix Operations2Linear OperatorsDe nition 1 Let U ,V be vector spaces F (Usually F R or C). ThenL(U; V ) fT : U ! V j T is linearg: In particular, L(U; U ) L(U ) is thespace of linear operators of U , and L(U; F ) U is its algebraic dual.De nition 2 Important Subspaces: Given WV (subspace), TU: In particular, we are interested in T 1 (f0g) Ker(T ): Also, if ST (S) U: We are most interested in T (U ) Ran(T ):1(W )U; thenTheorem 3 U; V ev F , dim(U ) n; dim(V ) m: Given B fu1 ; :::; un gbasis of U and B 0 fv1 ; :::; vm g basis of V; to each T 2 L(U; V ) we can associate0a matrix [T ]BB such that:T ui0[T ]BB a1i v1 ::: ami vm 8i 2 f1; ::; mg (aij ) in MmBn (F )T!UlFnVlFm!0[T ]BBConversely, given a matrix A 2 Mm0that A [T ]BB :n (F );B0there is a unique TA 2 L(U; V ) suchProposition 4 Given T 2 L(U ); there exist basis B and B 0 of U such that:0[T ]BB I000B is constructed as an extension for a basis for Ker(T ); and B 0 as an extensionfor fT (u)gu2B :2

Theorem 5 (Rank and Nullity) dim(U ) dim(Ker(T )) dim(Ran(T )): (T ) dim(Ker(T )) is known as the nullity of T; and r(T ) dim(Ran(T )) as the rankof T:basis of U; B 0 ,Change of Basis: U; V ev F; B andexists P invertible such that:00[T ]BB P [T ] P0basis of V; there1P is a matrix that performs a change of coordinates. This means that, if twomatrices are similar, they represent the same linear operator using a di erentbasis. This further justi es that key properties of matrices are preserved undersimilarity.3Diagonalizable OperatorsIf U V (T;is a linear operator), it is natural to impose that both basis B andB 0 are also the same. In this case, it is no longer generally true that we can nd a basis B such that the corresponding matrix is diagonal. However, if there0exists a basis B such that [T ]Bdiagonal matrix, we say T is diagonalizable.B De nition 6 V ev F; T 2 L(V ). 2 F is an eigenvalue of T if 9 v 2 V anonzero vector such that T v v: All nonzero vectors such that this holds areknown as eigenvectors of T:We can immediately derive, from this de nition, that the existence of theeigenpair ( ; v) (eigenvalue and corresponding eigenvector v) is equivalent tothe existence of a nonzero solution v to(TI)v 0This in turn tells us that the eigenvalues of T are those such that the operatorTI is not invertible. After selecting a basis B for V; this also means: udet([T ]BBI) 0Which is called the characteristic equation of T . We notice this equationdoes not depend on the choice of basis B; since it is invariant under similarity:det(P AP1I) det(P (AI)P1) det(AI)This equation nally is equivalent to nding the complex roots of a polynomial in : We know this to be a really hard problem for n 5, and a numericallyill-posed problem at that.3

De nition 7 V ev F; T 2 L(V ),T v vg is the eigenspace for :an eigenvector of T: Then E fv 2 V jTheorem 8 V ev F of nite dimension; T 2 L(V ): The following are equivalent:i) T is diagonalizableii) V has a basis of eigenvectors of Tiii) ThereLn exist subspaces W1 ; :::; Wn such that dim(Wi ) 1; T (Wi ) Wi andV i 1 WiLkiv) V i 1 E k with f 1 ; :::; k g eigenvalues of TPkv) i 1 dim(E i ) dim(V )Proposition 9 V ev C; T 2 L(V ) then T has at least one eigenvalue (this is acorollary of the Fundamental Theorem of Algebra, applied to the characteristicequation).Theorem 10 (Schur’s factorization) V ev C; T 2 L(V ): There always existsa basis B such that [T ]B is upper triangular.3.1The Rayleigh Quotient and the Min-Max Theorem3.2Gershgorin’s Discs TheoremAlthough calculating eigenvalues of a big matrix is a very di cult problem(computationally and analytically), it is very easy to come up with regions onthe complex plane where all the eigenvalues of a particular operator T mustlie. This technique was rst devised by the russian mathematician SemyonAranovich Gershgorin (1901 1933):Theorem 11 (Gershgorin, 1931) Let A (aPij ) 2 Mn (C): For each i 2 f1; ::; ng;we de ne the ith "radius of A" as ri (A) j6 i jaij j and the ith Gershgorindisc asDi (A) fz 2 C j jz aii j ri (A)Then, if we de ne(A) f jis an eigenvalue of Ag; it follows that:(A)n[Di (A)i 1That is, all eigenvalues of A must lie inside one or more Gershgorin discs.4

Proof. Let be an eigenvalue of A; v an associated eigenvector. We x i as theith coordinate of v with maximum modulus, that is, jvi j jvk j 8k. Necessarily,jvi j 6 0. Then,Avvi(aii )vi v )X aij vjX aij vjj6 ij(aii )j jvi jXj6 i2jaij j jvi jDi (A)Now, we know that A represents a linear operator T 2 L(Rn ), and thattherefore its eigenvalues are invariant under transposition of A and under similarity. Therefore:Corollary 12 Let A (aij ) 2 Mn (C): Then,is invertibleg(A)n\ [Di (P APfi 11)jPOf course, if T is diagonalizable, one of this P 0 s is the one such that P AP 1is diagonal, and therefore the Gershgorin discs degenerate to the n points we arelooking for. However, if we don’t want to compute the eigenvalues, we can stilluse this to come up with a ne heuristic to reduce the region given by the unionof the gershgorin discs: We can use permutation matrices or diagonal matricesas our P 0 s to get a "reasonable region". This result also hints at the fact that,if we perturb a matrix A, the eigenvalues change continuously.The Gershgorin disc theorem is also a quick way to prove A is invertible if itis diagonal dominant, and it also provides us with results when the eigenvaluesof A are all distinct (namely, that there must be at least one eigenvalue perGershgorin disc).4Hilbert Space Theory: Interior Product, Orthogonal Projection and Adjoint OperatorsDe nition 13 Let V ev F: An interior product on V is a function ; : VV ! F such that:1) hu v; wi hu; wi hv; wi 8u; v; w 2 V2) h u; wi hu; vi 8u; v 2 V , 2 F3) hu; vi hv; ui4) hu; ui 0 and hu; ui 0 ) u 05

By de nition,every interior product induces a natural norm for V , given bypkvkV hv; viDe nition 14 We say u and v are orthogonal, or u?v, if hu; vi 0Some important identities:2221. Pythagoras Theorem: u?v () ku vk kuk kvk2. Cauchy-Bunyakowski-Schwarz: jhu; vijequality () u v23. Parallelogram: ku vk ku2kuk kvk 8u; v 2 V with22vk 2(kuk kvk ) 8u; v 2 V4. Polarization:hu; vi hu; vi 12fku vkku441X ki u ik v4k 12vk g 8u; v 2 V if F R28u; v 2 V if F CIn fact, identities 3 and 4 (Parallelogram and Polarization) give us bothnecessary and su cient conditions for a norm to be induced by someinterior product. In this fashion, we can prove kk1 and kk1 are not inducedby an interior product by showing paralellogram fails.De nition 15 v 2 V is said to be of unit norm if kvk 1De nition 16 A subset S V is said to be orthogonal if the elements in S aremutually orthogonal (perpendicular)De nition 17 S is orthonormal if it is orthogonal and its elements are of unitnormIf S is orthogonal, then it is automatically LI (Linearly Independent). Intuitively, we can think of orthogonal vectors as vectors which do not "cast a shade"on each other, and therefore point to completel exclusive directions. We havethe following property: if S fv1 ; :::; vn g is orthogonal, then for all v 2 span(S):v i 1 v1 ::: hv; vi i8ihvi ; vi in vnThus, we can obtain the coe cient for each element of S independently, bycomputing the interior product with the corresponding vi : Furthermore, if S isorthonormal:i hv; vi i 8iAnd these coe cients are also called the abstract Fourier coe cients.6

Theorem 18 (Bessel’s Inequality) Let fv1 ; :::; vm g orthonormal set, v 2 V .Then:mX2jhv; vi ijkvki 1With equality () v 2 span(fvi gmi 1 ).4.1Orthogonal ProjectionThis last result suggests that, for an orthogonal set S; in order to retrieve thehv;vi ivi : Thiscomponent going "in the ith direction", we only need to compute hvi ;vi iis in fact a projection of our vector v in the direction of vector vi ; or the "shadow"that v casts on the direction of vector vi : We shall de ne this more generally,and see that we can de ne projection operators which give us the component ofa vector in a given subspace of V :De nition 19 SV: We de ne the orthogonal complement S ? fv 2 V jhv; si 0 8s 2 Sg: If S V (closed subspace), then S S ? V and (S ? )? S(always true for nite dimension)De nition 20 Let WV: Then we de ne PW 2 L(V ) such that, if v vW vW ? ; then PW (v) vW : We can also de ne this operator by it’s action ona suitable basis of V : If we take BW fw1 ; ::; wp g and BW ? fu1 ; :::; un p gbasis of W and W ? ; then B BW [ BW ? :PW (wi ) wi 8i 2 f1; :::; pgPW (uj ) 0 8j 2 f1; :::; n pg[PW ]B Ip p000From this, a myriad of properties for P can be deduced:21. PWPW : This follows easily from the fact that PW w w 8w 2 W2. Ran(PW ) W and Ker(PW ) W ?3. v PW v 2 W ? 8v 2 V : we can deduce this directly from the de nition, or compute the interior product with any member of W: Also, thisfollows from the diagram one can draw in R2 or R3 : If we remove the"shadow" cast by a vector, all that is left is the orthogonal component.This additonally tells us that:PW ? I7PW

4. kv PW vk kv wk 8w 2 W : This is a very strong result: it tells us theorthogonal projection is the best approximation to v through vectors in W:This is a key result which justi es the use of the projection in applicationssuch as least squares, polynomial interpolation and approximation, fourierseries, etc. In fact, this result can be extended to projection on convexsets in Hilbert spaces.5. kPW vkkvk 8v 2 W : This tells us the projection is a contraction. Inparticular, we know that kP k 1; since there are vectors (namely, thosein W ) for which equality holds.6. hPW u; vi hu; PW vi 8u; v 2 V (PW is "self-adjoint"). This can beproved explicitly using the unique decomposition of u and v as the sum ofcomponents in W and W ? . In particular, this also tells us that the matrixwhich represents PW is symmetric / self-adjoint as well if we choose a basisof orthonormal vectors.7. It can be shown that properties (1) and (4), (1) and (5) or (1) and (6)completely characterize the orthogonal projection. That is, from theseproperties alone we can deduce the rest, and that the operator P has tobe a projection onto its range.4.2The Gram-Schmidt Process and QR FactorizationWe can ask ourselves if, for any basis in V there exists a procedure to turn itinto an orthonormal basis. The Gram-Schmidt Process does exactly this, andas a by-product it also gives us a very useful matrix factorization, QR:Theorem 21 (Gram-Schmidt) If fwi gmi 1 is a linearly independent set, theremexists an orthonormal set fui gmsuchthatspan(fui gmi 1i 1 ) span(fwi gi 1 ): Itcan be constructed through the following process:89k X1 hwk ; vj ivj Pspan(fvi gk 1 )? (wk )v1 w1 , vk wki 1:;hvj ; vj ij 1u1 v1kv1 k,uk vkkvk kFurthermore, by completing fwi gmi 1 to a full basis of V (if malways obtain an orthonormal basis of V following this process.n), we canTheorem 22 (QR Factorization) Let A be a matrix of full column rank,with columns fwi gmi 1 : Then , by applying Gram-Schmidt to the columns of A(augmenting them to obtain a full basis if m n); we obtain the following:wk kvk k uk kX1j 18hwk ; uj i uj 8k

If we write this in matrix form, where Q is a matrix with columns fui gni 1 (byde nition, this is an orthogonal / unitary matrix) and Rkk kvk k ; Rjk hwk ; uj i if k j (upper triangular matrix), this last expression provides thefollowing matrix factorization for A:0kv1 k hw1 ; u2 i0101B 0kv2 kjjjjjjjjjj BB@A@Awm QR u1um um 1un B 0A w1 w20Bjjjjjjjjjj @ 0000This is, A (Q1 j Q2 )A.R10 Q1 R1 , where Q1 has the same column space asThis factorization is very useful both to solve linear systems of equations(there are e cient ways to compute QR; namely the Householder algorithm andother sparse or incomplete QR routines) because, once computed, the systemAx b is equivalent to solving:Rx Q bWhich can be rapidly solved through backward or forward substitution (since Ris upper triangular). Also, QR factorization is extensively used to obtain easierformulas for certain matrix products that appear in applications such as OLSand smooth splines.A relevant result regarding this matrix factorization is that, although it isnot unique in general, if we have that A Q1 R1 Q2 R2 , then it can be shownthat D R2 R1 1 is a diagonal, unitary matrix.4.3Riesz Representation Theorem and The Adjoint OperatorFor any linear operator T 2 L(V; W ); we can obtain a related operator T 2L(W; V ) called the adjoint operator, which has very interesting properties. Thisoperator becomes even more relevant for applications to vector spaces of in nitedimension. This operator is de ned as follows:De nition 23 Let T 2 L(V; W ). Then, the adjoint operator T 2 L(W; V ) isde ned by the following functional relation:hT v; wiW hv; T wiV 8v 2 V , w 2 WIf we choose orthonormal basis B and B 0 for V and W , then the matrix thatrepresents the adjoint is the conjugate transpose of the matrix that representsA: We get:hAx; yiRm hx; A yiRn 8x 2 Rn ; y 2 Rm00BBWhere A [T ]BB and A [T ]B 0 ([T ]B ) .9.00.1hwm ; u1 ihwm ; u2 iCCC.C.Ckvm k A0

The existence and uniqueness of this operator is given by the Riesz Representation Theorem for Hilbert Spaces:Theorem 24 (Riesz Representation) Let V vs F a Hilbert space, and T 2L(V; F ) a continuous, linear functional (element of the topological dual). Then,9!v 2 V such that:T u hu; vi 8u 2 VTherefore, the Adjoint operator is always well de ned by the functionalrelation we have outlined, parting from the linear functional Lw v hT v; wiW xing each w 2 W:Remark 25 Here is a quick application of the adjoint operator and the orthogonal projection operator: Let A 2 Mm n (F ); and Ax b a system of linearequations. Then the least squares solution to this system is given by the solutionto:rrrrrrrrrrrrAx0 PRan(A) (b)Since we are projecting b onto the column space of A; and we know this is thebest approximation we can have using linear combinations of the columns of A:Using properties of the projection operator, we now know that:Ax; bhAx; bPRan(A) (b) 0 8xAx0 i 0Now, using the adjoint of A; we nd:hx; A bA Ax0 i 0 8xSo, this means A b A Ax0 ; and therefore, if A A is invertible, this meansx0 (A A) 1 A byb Ax0 A(A A)1A bIncidentally, this also tells us that the projection matrix of a vector onto thecolumn space of a matrix A is given by PRan(A) A(A A) 1 A .Properties of the Adjoint: (T 2 L(V; W ))1. (T S) T S2. ( T ) T3. (T ) T4. IV IV (the identity is self-adjoint)5. (ST ) T S10

0B6. B and B 0 orthonormal basis of V and W () then [T ]BB 0 ([T ]B )(be careful, this is an () statement)The most important property of the adjoint, however, provides us with anexplicit relation between the kernel and the image of T and T . These relationscan be deduced directly from the de nition, and provide us with comprehensivetools to study spaces V and W:Theorem 26 ("Fundamental Theorem of Linear Algebra II") Let V; W nite dimentional Hilbert spaces, T 2 L(V; W ): Then:Ker(T ) Ran(T )?Ran(T ) Ker(T )?Thus, we can always write V Ker(T ) Ran(T ) and W Ker(T ) Ran(T ).Proof. (Ker(T ) Ran(T )? ): Let v 2 Ker(T ); and any T u 2 Ran(T ); thenv 2 Ran(T )? () hT u; vi 0 8u () hu; T vi 0 8u () v 2 Ker(T ).The proof of the second statement can be found replacing T and T above.A couple of results that follow from this one are:1. T is injective () T is onto2. Ker(T T ) Ker(T ), and thus, r(T T ) r(T ) r(T ) (rank).5Normal and Self-Adjoint Operators: SpectralTheorems and Related ResultsDepending on the eld F we are working with, we can obtain " eld sensitive"theorems that characterize diagonalizable operators. In particular, we are interested in the cases where the eld is either R or C: This discussion will alsoyield important results on isometric, unitary and positive operators.De nition 27 T 2 L(V ) is said to be a self-adjoint operator if TT . IfF R, this is equivalent to saying [T ]B is symmetric, and if F C, that [T ]Bis Hermitian (equal to its conjugate transpose).De nition 28 T 2 L(V ) is said to be normal if it commutes with its adjoint,that is, if T T T T:Remark 29 If F R, then an operator T is normal and it is not self-adjoint() 8B orthogonal basis of V , [T ]B is a block diagonal matrix, with blocks ofsize 1 and blocks of size 2 which are multiples of rotation matrices.First, we introduce a couple of interesting results on self-adjoint and normaloperators:11

Proposition 30 T 2 L(V ); F C then there exist unique self-adjoint operators T1 and T2 such that T T1 iT2 : T is then self-adjoint () T2 0, andis normal () T1 T2 T2 T1 : These operators are given by:T1 T T2, T2 TT2iProposition 31 If T 2 L(V ) is normal, then Ker(T ) Ker(T ) and Ran(T ) Ran(T ):The most important properties of these families of operators, however, haveto do with the spectral information we can retrieve:Proposition 32 T 2 L(V ) and self-adjoint and F C: Ifof T , then 2 Ris an eigenvalueProof. For u eigenvector, we have:hu; ui hu; T ui hT u; ui hu; uiProposition 33 T 2 L(V ) and F C: T is self-adjoint () hT v; vi 2 R8v 2 VProof. Using both self-adjoint and properties of the interior product:hT v; vi hv; T vi hT v; vi 8v 2 VThis, in particular tells us the Rayleigh quotient for such an operator isalways real, and we can also rederive last proposition.Proposition 34 If T 2 L(V ) is a normal operator,i) kT vk kT vk 8v 2 Vii) TI is normal 8 2 Ciii) v is an eigenvector of T () v is an eigenvector of TProof. (i): hT v; T vi hv; T T vi hv; T T vi hT v; T vi(ii): (TI) (TI) T TI(T T ) 2 I T TI(T T ) 2 I (TI)(TI)(iii): (TI)v 0 ) k(TI) vk 0 (by i and ii) () (TI) v 012

Theorem 35 (Spectral Theorem, F C Version) V vs C nite dimentional Hilbert space. T 2 L(V ). V has an orthonormal basis of eigenvectors ofT () T is normal.Proof. (( ) By Schur’s factorization, there exists a basis of V such that [T ]B isupper triangular. By Gram-Schmidt, I can turn this basis into an orthonormalone Q, and by studying the QR factorization, we realize the resulting matrixis still upper triangular. However, since the basis is now orthonormal, andT is normal, it follows that [T ]Q is a normal, upper triangular matrix. Thisnecessarily implies [T ]S is diagonal (We can see this by computing the productsof the o -diagonal elements, and concluding they have to be zero in order forthis to hold).( )) If this is the case, then we have Q an orthonormal basis and diagonalsuch that [T ]Q : Since a diagonal matrix is always normal, it follows that Tis a normal operator.Theorem 36 (Spectral Theorem, F R Version) V vs R nite dimentional Hilbert space. T 2 L(V ). V has an orthonormal basis of eigenvectors ofT () T is self-adjoint.Proof. We follow the proof for the complex case, noting that, since F R; bothSchur’s factorization and Gram-Schmidt will yield matrices with real entries.Finally a diagonal matrix with real entries is always self-adjoint (since this onlymeans that it is symmetric). We can also apply the theorem for the complexcase and use the properties for self-adjoint operators.In any case, we then have the following powerful properties:1. V kMi 1Eiand (E i )? ME8ijj6 i2. If we denote Pj PE j , then Pi Pj ij3. (Spectral Resolution of Identity)IV kXPii 14. (Spectral Resolution of T )T kXi Pii 1These properties characterize all diagonalizable operators on nite dimensional Hilbert spaces. Some important results that follow from this are:13

Theorem 37 (Cayley-Hamilton) T 2 L(V ); V nite dimensional Hilbert space.If p is the characteristic polynomial of T; then p(T ) 0.Theorem 38 V vs C and T 2 L(V ) normal, then 9p 2 C[x] polynomial suchthat p(T ) T . This polynomial can be found by the Lagrange Interpolationproblem fp( i ) i gki 1We also have the following properties for T normal, which we can now deduceusing the spectral decomposition of T . These properties basically tell us that, ifT is normal, we can operate it almost as if it were a number through its spectralrepresentation.Pk1. If q is a polynomial, then q(T ) i 1 q( i )Pi2. If T p0 for some p, then T03. An operator commutes with T () it commutes with each Pi4. T has a normal "square root" (S such that S 2 T )5. T is a projection () all its eigenvalues are 00 s or 10 s:6. T 5.1T (anti-adjoint) () all eigenvalues are pure imaginary numbersUnitary OperatorsDe nition 39 An operator T 2 L(V ) is said to be orthogonal (R) / unitary(C) if kT vk kvk 8v 2 V (this means T is a linear isometry, it is a "rigidtransformation"). A unitary operator can also be characterized by T normaland T T T T IV :Theorem 40 (Mazur-Ulam) If f is an isometry such that f (0) 0 and f isonto, then f is a linear isometry (unitary operator)Theorem 41 The following statements are equivalent for T 2 L(V ) :i) T is an isometryii) T T T T IViii) hT u; T vi hu; vi 8u; v 2 Viv) If B is an orthonormal basis, T (B) is an orthonormal basisv) There exists an orthonormal basis of V such that T (B) is an orthonormalbasis.Theorem 42 If is an eigenvalue of an isometry, then j j 1. T is then anisometry () T is an isometry as well.14

5.2Positive Operators and Square RootsDe nition 43 V a nite dimensional Hilbert Space, T 2 L(V ). We say T isa positive operator if T is self-adjoint and hT v; vi 0 8v 2 VRemark 44 A matrix A is said to be a positive operator (positive semide nitematrix) if hAx; xi x Ax 0 8x 2 F nRemark 45 If F C, we can remove the assumption that T is self-adjoint.Remark 46 The operators T T and T T are always positive. In fact, it canbe shown any positive operator T is of the form SS : This is a general versionof the famous Cholesky factorization for symmetric positive de nite matrices.Proposition 47 T is a positive operator () T is self-adjoint and all itseigenvalues are real and non-negative.Some properties of positive operators:1. T; U 2 L(V ) positive operators, then T U is positive2. T 2 L(V ) is positive ) cT is positive 8c3. T 2 L(V ) is positive and invertible ) T01is positive4. T 2 L(V ) positive ) T 2 is positive (the converse is false in general)5. T; U 2 L(V ) positive operators, then T U U T implies T U is positive.Here we use heavily that T U U T implies there is a basis of vectorswhich are simultaneously eigenvectors of T and U:De nition 48 T 2 L(V ): Then we say S is a square root of T if S 2 T .We note that, in general, the square root is not unique, For example, theidentity has an in nite number of square roots: permutations, re‡ections androtations by 180 : However, we can show that, if an operator is positive, thenit has a unique positive square root.Proposition 49 T 2 L(V ) is positive () T has a unique positive squareroot.15

66.1Singular Value Decomposition and the MoorePenrose Generalized InverseSingular Value DecompositionThe Singular Value Decomposition for T 2 L(V; W ) (and the correspondingfactorization for matrices) is, without a doubt, one of the most useful resultsin linear algebra. It is used in applications such as Least Squares Regression,Smooth Spline and Ridge Regression, Principal Component Analysis, MatrixNorms, Noise Filtering, Low Rank approximation of matrices and operators,etc. This decomposition also enables us to de ne a generalized inverse (alsoknown as the pseudoinverse), and to compute other decompositions, such as thePolar decomposition for positive operators and computing positive square rootsexplicitly.Theorem 50 (Singular Value Decomposition, or SVD) Let V; W vs F Hilbertspaces, T 2 L(V; W ) with rank r. Then, there exist orthonormal basis of V andW fv1 ; ::; vn g and fu1 ; ::; um g, as well as positive scalars 1:::2r 0such that:T viT vi i ui0; i r; i rThese scalars are called the "singular values" of T . Conversely, if basis andscalars like these exist, then fv1 ; ::; vn g is an orthonormal basis of eigenvectorsof T T such that the rst r are associated to the eigenvalues 21 ; :::; 2r , and therest are associated to 0.Using what we know about positive operators, we can see how the statement of this theorem must always be true. Regardless of what T is, T T isa positive operator, and therefore diagonalizable, with nonnegative eigenvalues f 21 ; :::; 2r g and possibly also 0: Then, we obtain the set of fu1 ; ::; um g bycomputing T vi i ui for the rst r vectors, and then completing it to anorthonormal basis of W .Also, this theorem immediately has a geometric interpretation: by choosingthe "right" basis, we know exactly what is the action of T on the unit sphere.Basically, T sends the unit sphere to the boundary of a r dimensional ellipsoid(since it squashes fvr 1 ; ::; vn g to zero), with axis on the rst r u0i s: Also, thebiggest axis of this ellipsoid is the one in the direction of u1 ; and the smallestis the one in the direction of ur .Finally, we note that this theorem applied to a matrix A 2 Mm n (F ) yieldsthe following matrix factorization: If V; U are unitary matrices with fv1 ; ::; vn gand fu1 ; ::; um g as columns, and if is the matrix in Mm n (F ) with all zeros16

except forii ifor i0jA U V @u 1jr, then:10j BB0un A BB.j @ .0jj01.0CC0 CBC@0 A0.r0100nrv1.vm1CAThis factorization is known as the Singular Value Decomposition, or SVD factorization of A:We know that, for the system of equations Ax b, the best approximationis given by the solution of A Ax A b. By using the SVD, we can alwayscompute the solution with minimum norm:Given an SVD for A; A U V , we have the following:kAxbk kU V x bk k V x Q bkSince U is a unitary matrix. Therefore, all we need is to minimize k yand then solve for x V y, c U b: However, it is clear that:k y2ck rXi 1j i yi2ci j nXi r 1ck ;2jci jWhich is minimized precisely when yi cii for ir and its minimum valuePn2is i r 1 jci j . If we want the y with minimum norm, all we have to do is tomake the rest of its coordinates zero.Now, solving for x; if we de ne:011001BCB 0 . 00 CyBC B .C.1@ .0 Ar00 0n rThen the solution for this problem is given by:x Vy Vyc (VyU )bFrom the properties of Least Squares, and this last formula, we already knowthat the matrix (V y U ) does the following:1. If b 2 Ran(A), (the system is consistent) then it gives us the solutionto Ax b with minimum norm. For any x 2 Rn ; we know we can writex PKer(A) x PKer(T )? x: Since A(PKer(A) x) 0; then (V y U )b is theunique solution in Ker(A)? .17

2. If b 2 Ran(A); (the system is inconsistent) then it projects b in CS(A);and then gives us the unique solution to Ax PRan(A) b in Ker(A)? :3. We can also deduce this from the fact that, from the construction ofthe SVD, the Fundamental Theorem of Linear Algebra II, and A V U ; that fv1 ; :::; vr g is a basis for Ker(A)? ; fvr 1 ; :::; vn g for Ker(A),fu1 ; :::; ur g for Ran(A) and fur 1 ; :::; um g for Ran(T )? .6.2The Moore-Penrose Generalized InverseTheorem 51 (Moore-Penrose Generalized Inverse) Given V; W vs F Hilbertspaces, T 2 L(V; W ) with rank r: There exists a unique linear operator, whichwe call the Moore-Penrose Generalized Inverse (or pseudoinverse, for short)T y : W ! V , such that, if S is the restriction of T to Ker(T )? , then:T y jRan(T ) S1As an extension of this inverse, it has the following properties:T yTTTPKer(T )?yPRan(T )Finally, if we have an SV D decomposition of T; the pseudoinverse T y can becomputed as:vj; j rT y uj jyT ujIn matrix form, if A [T ]UV, yA 0;[T ]VU ,then:yA Vyj rUThe following properties can be obtained for the SVD and the Pseudoinverse:1. Let A 2 Mm n (C). Then, A; A and A have the same singular values.Also, (Ay ) (A )y and (Ay ) (A )y2. (Moore-Penrose Conditions) Let T 2 L(V; W ). If an operator Uis such that: (a) T U TT , (b) U T UU and (c) U T and T U areself-adjoint then UT y . These conditions are a characterization of thepseudoinverse of T as a linear operator.3. We can check that the general formula for the projection to Ran(A) whichwe calculated with the adjoint matrix is:PRan(A) A(A A)y A AAy Uy0yUWhereis a diagonal matrix with 1 s in the rst r diagonal entriesand 00 s i

corollary of the Fundamental Theorem of Algebra, applied to the characteristic equation). Theorem 10 (Schur s factorization) V ev C; T 2 L(V): There always exists a basis B such that [T] B is upper triangular. 3.1 The Rayleigh Quotient and the Min-Max Theorem 3.2 Gershgorin s Discs Theorem