WikiJournal Preprints/Flawed proofs of the unambiguity of the determinant as defined by cofactor expansions

From Wikiversity
Jump to navigation Jump to search

WikiJournal Preprints
Open access • Publication charge free • Public peer review

WikiJournal User Group is a publishing group of open-access, free-to-publish, Wikipedia-integrated academic journals. <seo title=" Wikiversity Journal User Group, WikiJournal Free to publish, Open access, Open-access, Non-profit, online journal, Public peer review "/>

<meta name='citation_doi' value=>

Article information

Author: Gavin R. Putland[i] 

See author information ▼

Abstract

Some textbooks use the cofactor expansion of the determinant as the initial definition of it, and then try to prove that the definition is unambiguous in that the result of the expansion is independent of the choice of row(s) or column(s). This paper gives a model proof and compares it with some previously published proofs, which are shown to be incomplete in that they fail to demonstrate the equality of the row expansions with the column expansions, because they consider a general term in the sum but fail to allow for different modes of summation.


Introduction[edit | edit source]

Given that the expansion of a determinant in cofactors is unambiguous—meaning that the result does not depend on the choice of row or column—one can construct short and simple proofs of the elementary row/column properties of the determinant, and thence the relationship between the adjugate[a] and the inverse (see e.g. Kreyszig, 2011, pp. 295–7, 304–5). So it is not surprising that some undergraduate textbooks (including the one just cited) begin the systematic treatment of determinants by defining the determinant as a recursive cofactor expansion and then immediately asserting its independence of the choice of row or column; they omit the classical definition in terms of permutations.

The recursive definition is economical in that it avoids the need for a digression on permutations and their signs. It is intuitive in that it arises out of simultaneous linear equations in two and three variables, or cross products and scalar triple products (Kreyszig, 2011, pp. 291–3, 368–74), whereas the permutation-based definition looks arbitrary—unless it is derived from a list of properties that we would like the determinant to have, making the digression longer.

The recursive definition, however, requires a digression of its own: a self-contained proof that the definition is unambiguous. Where does one put this proof? Kreyszig, in the original edition of his Advanced Engineering Mathematics (1962, pp. 401–3), gives the proof immediately; but by the 6th Edition he has relegated it to the end of the chapter (1988, pp. 448–50), and by the 9th Edition, to the end of the book (2006, pp. A79–81). Kreyszig's competitors make him look persistent. Wylie, in his book of the same title, gives an immediate proof in the 2nd, 3rd, and 4th Editions; but the 5th and 6th Editions (1982, pp. 165–6; 1995, p. 230) merely state the result, with a footnote referring back to the 4th Edition for the proof. O'Neil, up to his 3rd Edition, delays the proof until the end of the section; but in the 4th Edition (1995, p. 381) he omits the proof, and by the 7th Edition he has changed tack, defining the parity of a permutation, defining the determinant in terms of permutations, and proving its elementary properties, all in five pages (2012, pp. 247–51). Noble, in the original edition of Applied Linear Algebra (1969, pp. 201–3), gives an immediate proof; but the 2nd Edition, co-written with Daniel, reduces the proof to a guided exercise while preserving its outline (1977, pp. 200, 205), and the 3rd Edition says "The proof is more complicated than instructive and is omitted" (1980, p. 160). Grossman, whose proof has notable similarities with Noble's (but also differences), delays the proof to the next section, and persists with this pattern through all four editions of his Elementary Linear Algebra (1980, pp. 77, 91–3, to 1991, pp. 119, 133–5).

As these deferrals and omissions show, proofs of the unambiguity of the cofactor expansion are considered hard to understand. Perhaps this is partly because all the proofs that I have seen are flawed: while they show that all row expansions yield a common result and that all column expansions yield a common result, they do not fully explain why the two common results must be the same. Here I offer a proof that fills the gap while remaining shorter than some of its predecessors.

Corrected proof[edit | edit source]

Definition 1:  The determinant D of the n× n matrix A, of which the element in the i th row and the k th column is aik,  is defined recursively as follows:

If  n = 1, then  .

If  n ≥ 2, let Mik denote the determinant of the submatrix obtained by deleting the i th row and the k th column of A. Then

 

 

 

 

(E1)

for any i, or

 

 

 

 

(E2)

for any k.  ◼

Mik is called the ikth minor of A; this minor together with the sign given by is called the ikth cofactor, whence (E1) and (E2) are called cofactor expansions (but the following discussion is in terms of minors).

Clearly (E1) is the expansion along the i th row while (E2) is the expansion down the k th column. Thus the definition gives 2n ways of expressing an n th-order determinant in terms of determinants of order n −1. Each determinant of order n −1 has 2(n −1) expressions in terms of determinants of order n −2, and so on. Eventually we get down to determinants of order 1, each of which is defined in only one way. Thus the number of ways of evaluating a determinant of order n is  .  Our problem is to show that all these evaluations produce the same result.

Theorem 1:  Definition 1 is unambiguous.

Proof:  Denoting the proposition to be proven by P(n), we proceed by induction on n.

For n = 1, the definition reduces to  , which is unambiguous; so P(1) is true.

For n = 2, equations (E1) and (E2) together give four expressions for the determinant, and it is trivial to verify that they are all the same; so P(2) is also true.

For the inductive step, we take n greater than 2 and suppose that P(n −1) is true, and try to establish P(n): that the results of (E1) and (E2) are independent of i and k respectively, and are equal to each other.

Consider the expansion along the i th row, given by (E1). By P(n −1)  we can expand Mik along any row (or column) and get the same result. So let us expand Mik along what was the j th row of A and consider the term in the expansion containing the element ajℓ. Note that  and  , since the i th row and k th column of A are not included in Mik. If  and  , the row and column numbers of ajℓ in the minor array (i.e., the submatrix whose determinant is Mik) will be just j and , as in the original matrix A, so that the term containing ajℓ in the expansion of Mik will be simply

where Mikjℓ (which we shall call a secondary minor) is the determinant of the submatrix obtained by deleting the i th and j th rows and the k th and  th columns of A. If  , the row number of ajℓ in the minor array will be j−1 , which is one less than its row number in the original matrix, so that the sign of the above expression will change. Likewise there will be a sign inversion if  . So in general, the term containing ajℓ in the expansion of Mik will be

where  (known as the signum function) returns the value  +1  if  , and  −1  if  . To complete the expansion of Mik along what was the j th row of A, we sum the above expression over , allowing to assume all values from 1 to n, excluding k. That is,

To obtain D, we substitute this result into (E1). The factor  , being independent of , may be taken inside the summation over . The resulting double summation is over all  n(n −1) ordered pairs of values of k and . Hence

 

 

 

 

(E3)

is the expansion of D along the i th row.

Now consider the expansion along the j th row of A. By P(n −1), we can expand the minors along any row. Equation (E3) has been found by expanding D along the i th row and the minors along the j th row of A. So when we expand D along the j th row, it is convenient to expand the minors along the i th row of A; the result will be the same as (E3) except that i and j will be interchanged. We can also interchange k and because the sum is over all ordered pairs of unequal values of k and ; to interchange the values of k and in each term is merely to interchange two terms in the sum. After both interchanges, we obtain

 

 

 

 

(E4)

as the expansion of D along the j th row.

Comparing the right-hand sides of (E3) and (E4), we see that they are equal; the two sign inversions in the sgn functions cancel out and the secondary minors Mikjℓ and Mjℓik are the same. This proves that the row expansion of D is independent of the row number, i.e. that the result of (E3) is independent of i. By P(n −1), that result is also independent of j. Since (E3) gives the same result for all  n(n −1) ordered pairs of values of i and j, we may find the row expansion of the determinant by averaging the right-hand side of (E3) over all the pairs; hence

 

 

 

 

(E5)

where the summation is over all  lists of values of  . This is the common value of all the row expansions.

Applying this result to the transpose of A, we conclude that all the column expansions of D have a common value, which may found by interchanging the row numbers with the column numbers in (E5) —that is, interchanging i with k, and j with . But these interchanges merely commute pairs of terms on the right of (E5), leaving the sum unaltered, so that the common value of the column expansions is the same as that of the row expansions. This establishes P(n) and completes the inductive step.  ◼

Comparison with earlier proofs[edit | edit source]

In the last paragraph of the above proof, interchanging the row numbers with the column numbers does not change D. If we did not first average D over all pairs of row numbers, we would perform the interchanges on (E3) instead of (E5), with the result that the indices of summation would change, and a consequent change in D would not have been rigorously ruled out. This change in the mode of summation is the essential weakness of the proofs reviewed below.

For brevity in the following discussion, a "two-row expansion by rows i and j " means an expansion of D by row i in which the minors are expanded by row j of A, and a "two-column expansion by columns k and ℓ " is defined analogously. Either type of expansion will be called a "double expansion".

Wylie[edit | edit source]

The earliest direct proof of Theorem 1 that I have seen in a textbook is by C. Ray Wylie; it appears in the 2nd Edition of his Advanced Engineering Mathematics (1960, pp. 4–6),[b] and survives essentially intact up to the 4th Edition (1975, pp. 457–9). His notation differs slightly from mine in that he identifies a secondary minor by writing the two row subscripts first, followed by a comma and the two column subscripts.

Wylie uses no summations signs; he refers to a general term ("typical term") in the expansion, assumes that the mode of summation is understood, and consequently falls short when he tries to show that the row and column expansions are identical. If we find the two-row expansion of D by rows i and j, using k and (in that order) as column indices, then find the two-column expansion by the columns k and , using i and j as row indices, we find that the general term in each expansion is the same. Wylie immediately concludes that the two expansions are equal because the general term—which, as he says, is the one and only term containing the "typical product"  —is the same in each expansion. But this argument overlooks the different modes of summation: the two-row expansion is summed over k and whereas the two-column expansion is summed over i and j. (It is true that both summations, if fully expanded, are over the same set of permutations; but the full expansion gives (n −2)! terms containing that "typical product", and we are not supposed to be using the permutation-based definition of the determinant!)

I deal with the index-of-summation problem by averaging the two-row expansion over all ordered pairs of values of i and j, and averaging the two-column expansion over all ordered pairs of values of k and , so that both averages involve a sum over the same values of the same indices. This step cannot be added to Wylie's proof without modification. In the two-row expansion, Wylie assumes without loss of generality that  (which avoids introducing a signum function). If he averages over all pairs of values of i and j for which  , the resulting quadruple summation will not possess any useful symmetry because there will be no corresponding requirement that  ; if he then swaps row indices with column indices to find the average two-column expansion, the indices will swap roles in the summation and it will not be clear that the sum is unchanged. However, he could harmonize the modes of summation by noting that k and can be interchanged, so that the sum over all pairs of values of k and is just twice the sum over the pairs for which  . He would then obtain a quadruple sum over all values of the indices for which  and  , so that interchanging row indices with column indices would leave the sum unaltered.

Another disadvantage of assuming that  is that both two-row expansions (the expansion by rows i and j, and the expansion by rows j and i) must be separately evaluated; we cannot obtain one from the other by simply switching i and j, as I have done. Furthermore, in determining the signs to be attached to the secondary minors, Wylie considers two cases: one for  and one for  . He could combine these into one by introducing a signum function; and if we do this for the column indices, it is natural to do the same for the row indices instead of assuming that .

The inductive hypothesis, which I call P(n −1), means that a determinant of order n −1 may be expanded by any row or column. The last two words become critical for Wylie when he comes to the two-column expansion, in which the minors are expanded by an arbitrary common column. But unfortunately he gives the impression that the inductive hypothesis concerns rows only. He begins his proof by saying that we shall first prove that we get the same result "no matter which row is chosen." Only then does he announce that the proof is by induction. The false impression is reinforced when he shows that "the theorem is true when  n = 2" by expanding a  2 × 2 determinant by each row, ignoring the columns.

There is an irony in Wylie's strategy. If we wish to show that all row expansions and all column expansions yield the same result, it suffices to show that the expansion by any row equals the expansion by any column; this implies that all row expansions are equal to each other (because they are equal to the same column expansion) and that all column expansions are equal to each other (because they are equal to the same row expansion). But Wylie proceeds in the reverse order: he first shows that all row expansions are equal, then tries to show that the general column expansion equals the general row expansion. If the latter step were valid, the former would be redundant! In my proof, however, it is necessary to begin by showing that all row expansions are equal, because only by averaging all the two-row expansions can we obtain an expression that is conveniently compared with the corresponding expression obtained from columns. That the latter expression can be obtained from the former by simply interchanging indices is a bonus.

Kreyszig; O'Neil[edit | edit source]

These authors, like Wylie, wrote under the title Advanced Engineering Mathematics.

Kreyszig's proof appears in his original edition (1962) and survives without substantive change in the 10th Edition (2011, pp. A81–3). Kreyszig considers the sign attached to the secondary minor, resolving the two cases for  and  , earlier in the argument than Wylie does. Unlike Wylie, he does not emphasize the uniqueness of the term in , and his notation for the corresponding minor matches mine. Like Wylie, he announces that we shall first prove that the expansion is the same for any row and that the proof is by induction, and then asserts P(2) for rows; he does not explicitly state that the inductive hypothesis applies to columns as well as rows or explain why it should.[c] And like Wylie, he considers the "typical term" but not the change in the mode of summation.

O'Neil's proof, which he retains up to the 3rd Edition (1991, pp. 697–8), is in the tradition of Wylie and Kreyszig, with some differences; in particular, O'Neil uses t and s instead of k and for column indices (but uses Kreyszig's order of subscripts for secondary minors),[d] takes the inductive step from n to n +1  instead of n −1 to n, establishes the 2 × 2 case for both rows and columns, states the inductive hypothesis as including both row expansions and column expansions, and is generally more terse. And unlike Wylie and Kreyszig, he uses the summation sign—but only for the initial row expansions: as soon as he considers expanding the minors in terms of secondary minors, he reverts to the "typical term" approach.

Noble; Grossman[edit | edit source]

These authors wrote under "linear algebra" titles.

Noble gives a detailed proof only in his first edition (1969, pp. 201–3). Like O'Neil (who wrote later), Noble establishes the 2 × 2 case for both rows and columns, states the inductive hypothesis as applying to both, and uses the summation sign only at the top level; but he uses Wylie's order of subscripts for secondary minors. Unlike other authors, he introduces the inductive step by saying:

We do this in three stages:

(a)  Expansion by first row = expansion by first column.
(b)  Expansion by first row = expansion by any row.
(c)  Expansion by first column = expansion by any column.

All this looks promising. But then his alleged proof of (a) merely shows that the coefficient of is the same in each sum; he does not confront the different indices of summation. Little wonder that the proof subsequently disappears from the book. For (b), he shows that the coefficient of in one expansion matches that of in the other, which is enough as both sums are over j and k. For (c), he simply applies (b) to the transpose—and I have taken his hint in my own proof.

Grossman (e.g., 1991, pp. 133–5) follows Noble in expanding by the first row rather than a general row, and in identifying a secondary minor by the two row subscripts first. Unlike Noble he implies that P(n −1) applies to rows only: he expands a  2 × 2 determinant by each row, ignoring the columns, and assumes as his inductive hypothesis that the expansion by the i th row of a determinant of order n −1 is equal to that by the first row. In the inductive step, Grossman tries to show that

(1)  expansion by the first row = expansion by the i th row, and
(2)  expansion by the k th column = expansion by the first row, and expansion by the  th column = expansion by the i th row.

For (1), he finds the two-row expansion by the first and i th rows, using k and respectively as column indices, and then reverses the procedure, finding the two-row expansion by the i th and first rows, using and k respectively as column indices.[e] Thus he shows that the only term containing the product  is the same in both two-row expansions, which is enough as both sums are over k and . For (2) he reports, very briefly, that if we perform a two-column expansion by the k th and  th columns, we find that the only term in  is the same as in the two-row expansions. But again the sum is over different indices. Moreover, the first row index cannot be used as an index of summation because it is a constant; the term in  is not a general term in the column expansion.

Conclusion[edit | edit source]

The examined earlier proofs of the unambiguity of the determinant are flawed because, in trying to show that the general row expansion equals the general column expansion, they consider only a general term in the summation and do not allow for the different indices over which the summation is performed. However, if we take the additional step of averaging all the two-row expansions after they have been shown to be equal, we obtain an expression whose mode of summation is the same as that of the corresponding expression obtained from columns, so that the row and column results can be validly compared. If we also use signum functions to adjust the signs attached to secondary minors (according as the row- or column index in the minor is equal to, or one less than, that of the same element in the complete determinant), we eliminate the need to consider multiple cases. A single two-row expansion then yields enough information to complete the proof: by simply swapping indices, we can find the second two-row expansion from the first, and find the averaged two-column expansion from the averaged two-row expansion.

The five published proofs reviewed here are variations on the same theme. Three of them are found in well-known textbooks sharing the title Advanced Engineering Mathematics, of which the most popular (Kreyszig's) retains the offending proof in the current edition. So it seems that this manner of proof has become something of a truism, especially among engineering students. That the published proofs are not quite valid is therefore a matter of concern.

Additional information[edit | edit source]

Acknowledgments[edit | edit source]

This paper was started in 1993—yes, three decades ago—while I was a postgraduate student in electrical engineering. As it was not part of my project nor especially important to my subsequent employment or volunteering, it was crowded out by other work. But my latest career move has refocused my attention on it, and the latest edition of Kreyszig (2011, pp. A81–3) shows that it has not entirely lost its currency.

Competing interests[edit | edit source]

None.

Ethics statement[edit | edit source]

This article does not concern research on human or animal subjects.

Notes[edit | edit source]

  1. Also known as the adjunct or the classical adjoint.
  2. The original (1951) edition reduces "Determinants and Matrices" to a 12-page appendix, which begins with the permutation-based definition of the determinant and omits proofs of subsequent theorems.
  3. In the 6th Edition (1988, p. 449), Kreyszig refers to an earlier example which shows that "The statement is true for a second-order determinant", and this example expands by rows and columns; but in the context, the "statement" refers to rows only.
  4. In all three editions of O'Neil's proof, two instances of "Mij" should be "Mit".
  5. Grossman's reversing the order of k and in the second expansion has the same purpose as my interchanging the column indices between (E3) and (E4); Wylie, Kreyszig and O'Neil likewise associate the same row index with the same column index.

Bibliography[edit | edit source]