WikiJournal Preprints/Cut the coordinates! (or Vector Analysis Done Fast)

From Wikiversity
Jump to navigation Jump to search

WikiJournal Preprints
Open access • Publication charge free • Public peer review

WikiJournal User Group is a publishing group of open-access, free-to-publish, Wikipedia-integrated academic journals. <seo title=" Wikiversity Journal User Group, WikiJournal Free to publish, Open access, Open-access, Non-profit, online journal, Public peer review "/>

<meta name='citation_doi' value=>

Article information

Author: Gavin R. Putland[i] 

See author information ▼

Abstract

(Partial draft—under construction.)


Introduction[edit | edit source]

Sheldon Axler, in his essay "Down with determinants!" (1995) and his subsequent book (4th Ed., 2023), does not entirely eliminate determinants, but introduces them as late as possible and then exploits them for their "main reasonable use in undergraduate mathematics",[1] namely the change-of-variables formula for multiple integrals.[a] Here I submit that for theoretical purposes, coordinates are to vector analysis as determinants are to linear algebra: I use coordinates for only three ends—to establish the unambiguity of coordinate-free definitions, to derive vector-analytic identities from vector-algebraic identities, and to deprecate certain notations—and for none of these ends do I go so far as to resolve a vector field explicitly into components in a coordinate system. The words "for theoretical purposes" are crucial; while one may well avoid determinants in computational linear algebra, one can hardly avoid coordinates in computational vector analysis! But if I concentrate on theory, as Axler does, the analogy seems apt.

The trouble with coordinates[edit | edit source]

Mathematicians define a "vector" as a member of a vector space, which is a set whose members satisfy certain basic rules of algebra (called the vector-space axioms) with respect to another set (called a field), which has its own basic rules (the field axioms), and whose members are called "scalars". Physicists are more fussy. They typically want a "vector" to be not only a member of a vector space, but also a first-order tensor : a "tensor", meaning that it has an existence independent of any coordinate system with which it might be specified; and "first order" (or "first-degree", or "first-rank"), meaning that it is specified by a one-dimensional array of numbers. Similarly, a 2nd-order tensor is specified by a 2-dimensional array (a matrix), and a 3rd-order by a 3-dimensional array, and so on; and a "scalar", being specified by a single number, i.e. by a zero-dimensional array, is a zero-order tensor. In "vector analysis", we are greatly interested in applications to physical situations, and accordingly take the physicists' view on what constitutes a vector.

So, for our purposes, if we define a quantity by three components in (say) a Cartesian coordinate system, we have not yet shown that it is a vector. To do the latter, we must then show that the quantity has an independent existence, by verifying that its coordinate representation behaves appropriately when the coordinate system is changed.[b] I avoid this complication in the obvious manner—by initially defining the quantity independently of any coordinates.[c] If, having defined it without coordinates, we then want to represent it with coordinates, we can choose the coordinate system for convenience.

In vector analysis (as distinct from vector algebra), the familiarity of Cartesian coordinates has perpetuated two notations, namely "∇ ⸱" and " ×" (explained below), which are misleading in other coordinate systems. Here I avoid these notations except to show how they can beguile and befuddle.

The trouble with limits[edit | edit source]

In the branch of pure mathematics known as analysis, there is a thing called a limit, whereby for every positive ϵ there is a positive δ such that if some increment is less than δ, some error is less than ϵ. In the branch of applied mathematics known as continuum mechanics, there is a thing called reality, whereby if the increment is less than some positive δ, the assumption of a continuum becomes ridiculous, so that the error cannot be made less than an arbitrary ϵ. Yet vector "analysis" is typically studied with the intention of applying it to some form of "continuum" mechanics, such as the modeling of elasticity, plasticity, fluid flow, or (widening the net) electrodynamics of ordinary matter; in short, it is studied with the intention of conveniently forgetting that, on a sufficiently small scale, matter is lumpy.[d] One might therefore submit that to express the principles of vector analysis in the language of limits is to strain at a gnat and swallow a camel. I avoid the camel by referring to elements of length or area or volume, each of which is small enough to allow some quantity or quantities to be considered uniform within it, but, for the same reason, large enough to allow such local averaging of the said quantity or quantities as is necessary to tune out the lumpiness.

Moreover, there are bigger camels in the brew…

Other notes on rigor[edit | edit source]

[Readers who are new to vector analysis should skip this section on a first reading.]

Erwin Kreyszig, in the 6th edition of his bestselling Advanced Engineering Mathematics (1988, p. 486), defines the gradient of a scalar field f  in Cartesian coordinates, and then introduces the del operator as

[sic! ] ,

which of course is not an operator at all, but a self-contained expression whose value is the zero vector—because it is a sum of derivatives of constant vectors. Nevertheless he rewrites the gradient of f  as ∇ f, apparently imagining that the differentiation operators look through the constant vectors rather than at them. Six pages later, he defines the divergence and then immediately informs us that "Another common notation for the divergence of v is ∇ ⸱ v," where is defined as before, and the resulting ∇ ⸱ v is allegedly not identically zero.[2] Presumably /∂xi  was meant to be  i/∂x, etc. These errors persist in the 10th edition (2011, pp. 396, 402–3). Similar howlers have been found in mathematics texts by Wilfred Kaplan, Ladis D. Kovach, and Merle C. Potter, and in electromagnetics texts by William H. Hayt and Martin A. Plonus.[3]  Knudsen and Katz, in Fluid Dynamics and Heat Transfer (1958), avoid the misdefinition of , but misdefine the divergence of V as V⸱  (which is actually an operator), and then somehow reduce it to the correct expression for ∇ ⸱ V. [4]

While the foregoing errors may be dismissed as mathematical spoonerisms and easily corrected, one cannot so easily repair every attempted proof in which the operator

is treated as an ordinary vector, so that a divergence written with "∇ ⸱" and a curl written with " ×" become what Edwin B. Wilson called a "(formal) scalar product" and a "(formal) vector product".[5] The "formal product" concept has been attacked at length by Chen-To Tai (1994, 1995). In its defense we should note that the operators /∂x, /∂y, and /∂z are linear, so that they are distributive over addition and may be permuted with multiplication by a constant, as if the operators themselves were multipliers. They may also be permuted with other like operators—explaining why the "formal product" interpretation correctly predicts that  ∇ × ∇ f  is zero (as if it were a cross-product of two parallel vectors), and that  ∇ × v  is zero (as if it were a scalar triple product with a repeated factor), and that  ∇ × (∇ × v) = ∇ ∇⸱ v − ∇2v  (as if we were expanding a vector triple product). But they cannot be permuted with multiplication by a variable, because then the product rule of differentiation applies, yielding an extra term. The formal-product system tries to address this difficulty by generalizing the product rule, to the effect that if a first-order differential operator (such as ∇) is applied to a product of two factors, we take the sum of the terms obtained by varying one factor at a time. As Borisenko & Tarapov put it,

the operator acts on each factor separately with the other held fixed. Thus should be written after any factor regarded as a constant in a given term and before any factor regarded as variable.[6]

To illustrate both the strength and the weakness of this approach, let us find the gradient of the dot-product of two vector fields A and B.[7] In this case the "generalized" product rule gives

 

 

 

 

(7.26)

where the subscript c marks the factor held constant during the differentiation.[e] Now we take two sideways steps: by expansion of a formal vector triple product, we have the identity

which may be rearranged as

 

 

 

 

(7.28)

Similarly,

 

 

 

 

(7.29)

Hence, substituting (7.28) and (7.29) into (7.26), in which the order of the dot-products is immaterial, and dropping the c subscripts (because they are now outside the differentiations), we get the correct result

 

 

 

 

(7.30)

Tai (1995, p. 47) is unimpressed, asking why we cannot apply (7.28), or its algebraic counterpart

directly to the left side of (7.26). The answer to that is obvious: on the left side, the operator is applied to a product of two variables, and the variations of both must be taken into account. By way of analogy, let us derive the product rule of elementary calculus following Borisenko & Tarapov. Treating the operator d/dx as a formal multiplier which can be permuted with multiplication by a constant but not with multiplication by a function of x, we have

d/dx(uv)  = d/dx(uc v) + d/dx(uvc)
= d/dx(uc v) + d/dx(vc u)
= ucd/dxv + vcd/dxu
= u d/dxv + v d/dxu ,

where top line follows from the "generalized" product rule (compare eq. 7.26), and the bottom line is correct. But there is a harder question which Tai does not ask: in (7.28) or its predecessor, why can't we have ⸱Ac instead of Ac ?  Because that would make the term vanish? Yes, it would; but, as there is only one variable factor on the left side, why do we need two terms on the right? Because the rule says should be written after the constant but before the variable? Yes, but now we're making the rule look arbitrary! We cannot settle the question even by appealing to symmetry. Obviously the right side of (7.30), like the left, must be unchanged if we switch A and B; and indeed it is. But if the first term on the right of (7.28) and of (7.29) were to vanish, the necessary symmetry of (7.30) would be maintained.

For another example of the same issue, consider the following two-liner offered by Panofsky & Phillips (1962, pp. 470–71) and rightly pilloried by Tai (1995, pp. 47–8):

If the first line were right, the authors would hardly bother to continue; but evidently it isn't, because it expands a formal vector triple product while forgetting that the first formal factor is actually a differential operator applied to a product of two variables. The second line does not follow from the first and includes divergences of constants, which ought to vanish but somehow apparently do not. Let's try again, this time sticking to the rules:

Here the first line comes from the "generalized" product rule, and the third line—the correct result—is obtained from the second by rearranging terms and dropping subscripts. The interesting line is the second, which is obtained from the first by expanding the formal vector triple products. But again, why must we have Ac and Bc, instead of ⸱Ac and ⸱Bc, which would make the middle two terms vanish? Again symmetry does not give an answer. The right-hand side, like the left, must change sign if we switch A and B. But the disappearance of the Ac and Bc terms would maintain the required (anti)symmetry. Funnily enough, the result would then be the same as the incorrect first line given by Panofsky & Phillips (above). But then how would we know that it is incorrect? The errors caused by the product of two variables on the left could not possibly cancel out… could they?

These examples show that "formal product" arguments can be tenuous. One way to restore rigor is to drill down to the underlying partial derivatives w.r.t. x , y, and z , preferably using a compact notation that allows the partial derivatives w.r.t. different coordinates to be processed together, without hiding them behind the operator. We shall come to that in due course.

Volume-based definitions[edit | edit source]

Assuming that the reader is familiar with the algebra of vectors in 3D space, and with differentiation of a scalar or vector with respect to a scalar (such as time), I need to introduce four quantities which may be conceived as derivatives w.r.t. the position vector; these are the gradient, the divergence, the curl, and the Laplacian. I shall define these terms in a way that will seem unremarkable to all readers who aren't familiar with them, but strange to some who are. The gradient is commonly introduced in connection with a curve and its endpoints, the curl in connection with a surface segment and its enclosing curve, and the divergence and the Laplacian in connection with a volume and its enclosing surface. Here I introduce all four in connection with a volume and its enclosing surface. The volume-based definitions of the gradient and curl are usually thought to be more advanced, in spite of being conceptually simpler and more easily related to the divergence and the Laplacian; here I start with the simpler, more relatable definitions.

Instant integral theorems—with a caveat[edit | edit source]

The term field, mentioned above in connection with algebraic axioms, has an alternative meaning: if r is the position vector, a scalar field is a scalar function of r, and a vector field is a vector function of r; both may also depend on time. Let V be a volume (3D region) enclosed by a surface S. Let n̂ be the unit normal vector at S, pointing out of V. Let n be the distance from S in the direction of n̂ (positive outside V, negative inside), and let n be an abbreviation for /∂n, with a tacit acknowledgment that the derivative—commonly called the normal derivative—is simply assumed to exist.

In V, and on S, let p be a scalar field (e.g., pressure in a fluid), and let q be a vector field (e.g., flow velocity), and let ψ be either a scalar field or a vector field. Let a general element of the surface S have area dS, and let it be small enough to allow n̂, p, q, and n ψ to be considered uniform over the element. Then, for every element, the following four products are well defined:

 

 

 

 

(1)

If p is pressure in a non-viscous fluid, the first of these products is the force exerted by the fluid in V  through the area dS. The second product is the flux of q through the surface element; if q is flow velocity, the second product is the volumetric flow rate out of V  through dS. The third product does not have such an obvious physical interpretation; but if q is circulating clockwise about an axis directed through V, the cross-product will be exactly tangential to S and will tend to have a component in the direction of that axis. The fourth product, by analogy with the second, might be called the flux of the normal derivative of ψ through the surface element, but is equally well defined whether ψ is a scalar or a vector.

If we add up each of the four products over all the elements of the surface S, we obtain, respectively, the four surface integrals

 

 

 

 

(2)

in which the double integral sign indicates that the range of integration is two-dimensional. The first integral takes a scalar field and yields a vector; the second takes a vector field and yields a scalar; the third takes a vector field and yields a vector; and the fourth takes a scalar field and yields a scalar, or takes a vector field and yields a vector. If p is pressure in a non-viscous fluid, the first integral is the force exerted by the fluid in V  on the fluid outside V. The second integral, commonly called simply the surface integral of q over S, is the total flux of q out of V. The third integral may be called the skew surface integral of q over S ,[8] or, for the reason hinted above, the circulation of q over S.  And the fourth integral is the surface integral of the outward normal derivative of ψ.

Let the volume V be divided into elements. Let a general element have the volume dV and be enclosed by the surface δS —not to be confused with the area dS of a surface element, which may be an element of S or of δS. Now consider what happens if, instead of evaluating each of the above surface integrals over S, we evaluate it over each δS and add up the results for all the volume elements. In the interior of V, each surface element of area dS is on the boundary between two volume elements, for which the unit normals n̂ at dS, and the respective values of n ψ, are equal and opposite (provided that n̂ is piecewise well-defined, as we shall always suppose). Hence when we add up the integrals over the surfaces δS, the contributions from the elements dS cancel in pairs, except on the original surface S, so that we are left with the original integral over S. Thus, for the four surface integrals in (2), we have respectively

 

 

 

 

(3)

Now comes a big "if":  if  we define the gradient of p (pronounced "grad p") as

 

 

 

 

(4g)

and the divergence of q as

 

 

 

 

(4d)

and the curl of q as

 

 

 

 

(4c)

and the Laplacian of ψ as [f]

 

 

 

 

(4L)

(where the letters after the equation number stand for gradient, divergence, curl, and Laplacian, respectively), then equations (3) can be rewritten

But because each term in each sum has a factor dV, we call the sum an integral; and because the range of integration is three-dimensional, we use a triple integral sign. Thus we obtain the following four theorems relating integrals over an enclosing surface to integrals over the enclosed volume:

 

 

 

 

(5g)

 

 

 

 

(5d)

 

 

 

 

(5c)

 

 

 

 

(5L)

Of the above four results, only the second (5d) seems to have a standard name; it is called the divergence theorem (or Gauss's theorem or, more properly, Ostrogradsky's theorem[9]), and is indeed the best known of the four—although the other three, having been derived in parallel with it, may be said to stand on similar foundations.

But remember the "if": These theorems depend on definitions (4g) to (4L) and are therefore only as definite as those definitions. That is the advertised "caveat", which must now be addressed.

Unambiguity of definitions[edit | edit source]

Equations (3) do not assume anything about the shapes and sizes of the closed surfaces δS (except, as always, that n̂ is piecewise well-defined). These equations indicate that the surface integrals are additive with respect to volume; but the additivity, by itself, does not guarantee that the surface integrals are shared among neighboring volume elements in proportion to their volumes, as envisaged by "definitions" (4g) to (4L). Each of these "definitions" is unambiguous if, and only if, the ratio of the surface integral to dV is insensitive to the shape and size of δS for a sufficiently small δS. This condition can be established by breaking it into three steps:

  1. If the volume element enclosed by δS is a rectangular block of fixed orientation, the ratio is insensitive to the (small) dimensions of the block;
  2. If the volume element enclosed by δS is a union of such rectangular blocks, the ratio in step 1 may be considered uniform throughout the element and accordingly applied to the whole element; and
  3. If the volume element enclosed by δS is not a union of such rectangular blocks, but is approximated by a such a union, so that each element of the surface δS is approximated by an angular surface (the Lego-brick effect), then the integral over that surface element, and hence over the whole of δS, is insensitive to the angularity (for small elements), so that the ratio of the integral over δS to the volume dV is likewise insensitive to the angularity.

To justify step 1, we note that if the volume element is rectangular, the faces of δS are equicoordinate surfaces in a Cartesian coordinate system (indeed there are infinitely many systems with suitable orientations). Let the coordinates be xi where i ∊ {1, 2, 3}.  Let ei be the unit vector in the direction of the xi axis, and dxi the length or width or height of δS in that direction. And let i be an abbreviation for /∂xi ,  hence i2 for 2/∂xi2.  Now let us find p as defined by (4g). The contribution from the constant-x1 faces at x1 and  x1+ dx1  is

Adding in the contributions from the other pairs of opposite faces, we get

Comparing (4g) with (4d) and (4c), we see that when we apply the same procedure to the latter two, we must get

and

But because each ei (in a Cartesian system) is uniform in magnitude and direction, it can be taken outside the differentiations, so that the last three equations become

 

 

 

 

(6g)

 

 

 

 

(6d)

 

 

 

 

(6c)

Meanwhile in (4L), the contribution to ψ from the constant-x1 faces is

so that when we add in the contributions from the other pairs of opposite faces, we get

 

 

 

 

(6L)

I note in passing that equations (6g) and (6L) are perfectly conventional, and that (6d) and (6c), although less commonplace, agree with the initial definitions of the divergence and curl given by the great J. Willard Gibbs,[10] which we shall revisit later. For the moment, we see from the derivations of eqs. (6g) to (6L) that, if the increments dxi are small enough to allow difference quotients to be replaced by derivatives, then the right-hand sides of these equations are insensitive to the increments—as claimed in step 1.  Of course, replacing the difference quotients by derivatives assumes differentiability w.r.t. Cartesian coordinates —which, in the case of the Laplacian, means that ψ is assumed to be twice differentiable w.r.t. each Cartesian coordinate.

At this point, if we merely wanted to show that the quantities defined by eqs. (6g) to (6L) are tensors, it would suffice to show that their representations in any other Cartesian coordinate system (not aligned with the rectangular elements) have the same forms as in the system xi (aligned with the elements). By the generality of the orientations of the two systems, this in turn would imply that the quantities given by (4g) to (4L) are the same for rectangular elements of any orientation. But the same implication would follow if we could show, as planned, that the quantities given by (4g) to (4L) are the same for elements of any shape. Let us therefore proceed with steps 2 and 3.

Step 2 follows from an assumption of continuity: if the fields have continuous derivatives w.r.t. the coordinates (second derivatives in the case of the Laplacian), then, if the union of rectangular blocks is sufficiently small in spatial extent, the derivatives may be considered uniform throughout the union, whence the ratio of the surface integral to the volume for one block may be considered uniform throughout the union, so that when the surface integral and the volume are summed over the union, their ratio stays the same.

To justify step 3, let the surface element with area dS be approximated by dS', which is a union of subelements, each of which is an equicoordinate surface with area and unit normal (out of the associated volume subelement); and let be the distance from the subelement in the direction of (in other words, let be to as n is to n̂). Thus the surface subelements are contained within negligible distance of the original element, but have discrete orientations. Then the four products in (1) are respectively approximated by

where each sum is over dS'. In the first sum, p may be considered uniform over dS' and taken outside the summation. In the second and third sums, the same can be done with q.  In the fourth sum, by the chain rule, we can write as  and take outside the summation. Thus the four sums become

 

 

 

 

(7)

In the first three expressions in (7), the component of in the direction of ei is the area of the projection of dS' on a constant-xi surface (since the surface subelements on which another coordinate is constant are perpendicular to any constant-xi surface and therefore make no contribution to the projection), with a positive sign if has the same direction as ei on the subelements that contribute to the projection. As dS' approximates the original surface element with area dS, the signed projection of dS' approximates the projection of the original element, and the latter projection is the component of  n̂ dS  in the direction of ei. So, to the desired approximation,  may be replaced by n̂ dS, since the two have the same component in the direction of each ei. In the fourth expression in (7),  is the cosine of the angle between and n̂, so that is the area of the projection of the surface subelement on the original surface element, and the sum of these projections can be replaced (to the same approximation) by the area of the original element, namely dS. When we make the indicated replacements,  (7) becomes

which agrees with (1), completing step 3, and completing the proof that definitions (4g) to (4L) are unambiguous.

Let us therefore summarize the definitions. If n̂ is the outward unit normal from the surface of the volume element, then

  • the gradient of p is the surface integral of  n̂ p per unit volume;
  • the divergence is the outward flux per unit volume;
  • the curl is the skew surface integral per unit volume, or the surface circulation per unit volume; and
  • the Laplacian is the surface integral of the outward normal derivative per unit volume.

All four operations are meaningful without reference to any coordinate system. The gradient maps a scalar field to a vector field; the divergence maps a vector field to a scalar field; the curl maps a vector field to a vector field; and the Laplacian maps a scalar field to a scalar field, or a vector field to a vector field.

If q is the flow velocity of a fluid, the divergence of q is the outward volumetric flow rate per unit volume. In accordance with the everyday meaning of "divergence", this quantity is positive if the fluid is expanding, negative if it is contracting, and zero if it is incompressible.

If p is pressure in a non-viscous fluid, the gradient of p is the force exerted by a unit volume of the fluid on the surrounding fluid; hence minus the gradient of p is the force per unit volume exerted on an element of the fluid by the surrounding fluid. Thus our definition of the gradient, although not the most common one, agrees with the common notion that a pressure "gradient" gives rise to a force.

The above definitions of the curl and the Laplacian are also not the most common ones. But we now have the tools with which to derive alternative definitions of the gradient, the curl, and the Laplacian from those above.

Alternative definitions and related integral theorems[edit | edit source]

Having shown that definitions (4g) to (4L) are independent of the shape of the volume element, we can now strategically restrict the shape in order to obtain other interpretations of the definitions. In particular, let the volume element be a right cylinder with an arbitrarily-shaped base. Let the base area be dA and the perpendicular height ds, where s measures length along a curve orthogonal to the base and takes the value s at the base and  s+ ds at the face opposite the base; and let ŝ be the unit vector perpendicular to the base, in the direction of increasing s. Let the curve δC be the circumference or perimeter of the base. Let measure arc length around δC, clockwise if we look in the direction of ŝ, and let t ̂ be the unit vector tangential to δC in the direction of increasing . Then n̂ takes the value ŝ on the base,  ŝ on the face opposite the base, and  t ̂× ŝ  on the curved face.

Gradient[edit | edit source]

For the cylindrical element, with the indicated substitutions, definition (4g) becomes

where the three terms in parentheses (from left to right) are the contributions from the base, the face opposite the base, and the curved face (for which the integral is around the closed loop δC ). This simplifies to

 

 

 

 

(8g)

Taking dot-products of both sides of (8g) with ŝ, we get

 

 

 

 

(9g)

So the component of p in the direction of ŝ is the derivative of p w.r.t. arc length in that direction.

For real p, we see that δs p has its maximum, namely |p|, when ŝ is in the direction of p; thus p is the vector whose direction is that in which the derivative of p w.r.t. arc length is a maximum, and whose magnitude is that maximum. This is the usual conceptual definition of the gradient. If, on the contrary, ŝ is in any direction tangential to a surface of constant p, then δs p in that direction is zero, in which case, by (9g),  p is orthogonal to ŝ.  So p is orthogonal to the surfaces of constant p (as we would expect, having just shown that p changes most rapidly in the direction of p).

When s, which measures arc length, changes by ds, the change in the position vector is

so that we can multiply (9g) by ds and obtain

 

 

 

 

(10g)

Adding (integrating) over all the elemental displacements dr from (say) r1 to r2, we get

 

 

 

 

(11g)

where the integral is over any path from r1 to r2. This result, which is strongly reminiscent of the fundamental theorem of calculus, is the best-known integral theorem involving the gradient, and is commonly called the gradient theorem.[g] If we close the path by setting r2 = r1 , the theorem reduces to

 

 

 

 

(12g)

where the integral is around any closed circuit.

From (8g) again, taking cross-products with ŝ on the left, we get

Noting that  ŝ ⸱ t ̂ = 0 (since t ̂ is in the plane of the base while ŝ is normal to it) and that  s ̂⸱ ŝ = 1, we have

Then, rewriting t ̂ dℓ as dr (because it is the change in position when increases by dℓ ), and renaming ŝ as (because it is normal to the plane of δC, which is traversed clockwise about it), and rearranging, we get

 

 

 

 

(13g)

Now let Σ be a (generally curved) surface segment bounded by a closed curve C. Let a general element of the surface have area dA and be bounded by a curve δC, which is traversed clockwise if we look in the direction of , the unit vector normal to the surface, pointing to the same side of the surface for all elements. Let the elements be small enough to allow to be considered uniform over each element—that is, to allow each surface element to be considered flat. Then (13g) is applicable to each element. If we add up the closed-circuit integrals over all the elements, then for every dr on one circuit there is an equal and opposite dr on an adjacent circuit, except on the outer curve C. So the result of the summation is

 

 

 

 

(14g)

This is the third integral theorem involving the gradient, and the least prominent of the three.

The corresponding result involving the curl  is far better known…

Curl[edit | edit source]

Comparing definitions (4g) and (4c), we see that if we proceed as in the derivation of (8g), starting from (4c) instead of (4g), we must get

or, after expanding the vector triple product,

Taking dot-products with ŝ, noting (again) that  ŝ ⸱ t ̂ = 0 ,s ̂⸱ ŝ = 1, and t ̂ dℓ = dr, and (again) renaming ŝ as because it is normal to the surface element enclosed by δC, we obtain

 

 

 

 

(15c)

The integral on the right is called the circulation of q around δC, so that the entire right-hand side is the circulation per unit area, and the equation says that the component of curl q normal to a surface is the circulation of q per unit area in that surface. Hence, for real q, the circulation of q per unit area has its maximum, namely |curl q|, when the normal to the surface is in the direction of curl q; thus curl q is the vector whose direction is that which a surface must face if the circulation of q per unit area in that surface is to be a maximum, and whose magnitude is that maximum. This is the usual conceptual definition of the curl.

By (12g), if q is the gradient of something, then the integral in (15c) is zero, so that, by (15c), the component of curl q in any direction is zero; in short, the curl of a gradient is zero.

Again, let the surface segment enclosed by δC be a general element of a macroscopic surface segment Σ enclosed by a curve C. Solving (15c) for the integral and adding all the elements, noting the pairwise cancellation of the contributions  q ⸱dr  except on C, we get

 

 

 

 

(16c)

This celebrated result, known as Stokes' theorem or the Kelvin–Stokes theorem,[11] is the standard integral theorem involving the curl.

After deriving the gradient theorem (11g), we considered what happened if the path of integration was closed. So now, having derived the Kelvin–Stokes theorem, let us consider what happens if the open surface Σ, circumscribed by the curve C, is replaced by a closed surface S enclosing a volume V. To do this, we let Σ be a segment of S, and then let Σ expand across S until it engulfs S, so that C shrinks to a point on the far side of S. Then the integral on the left of (16c) is zero, so that (16c) becomes

to which we apply the divergence theorem (5d), obtaining

 

 

 

 

(17c)

As this applies to any volume, the integrand must be zero everywhere; that is, the divergence of the curl is zero.

For curiosity's sake, we can also treat the obscure theorem (14g) as we have just treated the famous theorem (16c). If we expand Σ until it covers the closed surface S and reduces the curve C to a point, (14g) becomes

Then applying theorem (5c) confirms that the curl of the gradient is zero.

Laplacian[edit | edit source]

In definition (4L), if ψ is a scalar field, then by (9g) we can put  n ψ = n̂  ∇ψ  in (4L), compare the result with definition (4d), and notice that the Laplacian is the divergence of the gradient. This indeed is the usual introductory definition of the Laplacian. Notice, however, that this definition makes the Laplacian dependent on the gradient and therefore assumes a scalar field,[h] whereas our initial definition of the Laplacian (4L) makes sense whether ψ is a scalar or a vector (or anything else that one can differentiate w.r.t. n).

So far we have seen that the curl of the gradient, and the divergence of the curl, are zero, and that the divergence of the gradient of a scalar is the Laplacian. There are two other well-defined combinations to two operators chosen from {grad, div, curl}, namely the curl of the curl and the gradient of the divergence, and they turn out to be related through the Laplacian of a vector. By two applications of (6c) we have, in Cartesian coordinates,

or, on comparison with (6g), (6d), and (6L),

which may be rearranged as

 

 

 

 

(18)

We could use this result as a coordinate-free definition of the Laplacian of a vector, if we did not already have one.[12] But we do: we started with a coordinate-free definition (4L), from which we obtained the conventional definition in Cartesian coordinates (4L), from which we showed that the initial definition (4L) is unambiguous; and this definition works equally well for a scalar or a vector. Wherever we start, we may properly state by way of contrast that the Laplacian of a vector is given by (18), whereas the Laplacian of a scalar is given by the divergence of the gradient. But I do not conclude, as Moon & Spencer do, that representing the scalar and vector Laplacians by the same symbol is "poor practice… since the two are basically quite different",[13] because in fact the two have a common definition which is succinct, unambiguous, and coordinate-free: the Laplacian (of anything) is the closed-surface integral of the outward normal derivative per unit volume.

Reduction to two dimensions[edit | edit source]

[To be continued.]

Notation[edit | edit source]

We have written the gradient of gradient of p (pronounced "grad p") as p, where the operator is called del.[i] If our general definition of the gradient (4g) is also taken as the general definition of the operator, then, comparing (4g) with (4d) and (4c), we see that  div q  and  curl q  are respectively ∇(⸱ q) and  ∇(× q).  It does not  follow that we can shift the parentheses and write the divergence and curl of q as (∇)q  and (∇ ×)q.  In general we should not even expect such a regrouping of operators to be meaningful—not least because the dot or cross is a binary operator, whereas is unary. Neither does it follow that we can simply drop the parentheses and write the divergence and curl of q as ⸱ q  and ∇ × q.  In particular, definitions (4d) and (4c) do not give any warrant for shifting or dropping the parentheses, because they do not let us isolate q on one side of the dot or cross.

And yet the   and ∇ ×  notations are de-facto standards. Even Gibbs himself, giving definitions equivalent to our (6d) and (6c), uses the   and ∇ ×  notations on the left sides of the equations—and only after the equations does he announce that " ∇⸱ ω is called the divergence of ω  and ∇ × ω  its curl."[14] He can do this because (6d) and (6c), unlike (4d) and (4c), do let us isolate q. If (4g) is the general definition of the , its translation into Cartesian coordinates is (6g), which can be written

whence

if  it is understood that the result of the linear combination of i operators is the same linear combination of the results of the operators applied separately. Thus, in Cartesian coordinates, p looks like the product of and p, and might therefore be described, following Gibbs's student Edwin B. Wilson, as a "formal" or "symbolic" product.[15] By the same "formal" reasoning,

where moving the "formal" factor i from the first vector to the second makes no difference to the "product"; and the result, by (6d), is div q.  Similarly,

which, according to (6c), is curl q.  While we're at it,

so that

which, according to (6L), is ψ. And indeed 2 is a standard symbol for the Laplacian operator (although Gibbs and Wilson were content to leave it as ∇ ).

We might therefore ask: if the gradient operator is , under what conditions can we write the divergence, curl, and Laplacian operators as  ,∇ × ,  and 2? More precisely, under what constraints on a coordinate system can we define as the gradient operator in that coordinate system and, with thus defined, write the divergence, curl, and Laplacian operators in that coordinate system as  ,∇ × ,  and 2? To answer this, we shall begin by expressing the four operators in a generalized coordinate system.

Curvilinear coordinates[edit | edit source]

In 3D space, let the position vector r be a function of the coordinates ui, where i ∊ {1, 2, 3}, and let i be an abbreviation for /∂ui. Let us define

so that

The vectors hi are not assumed to be mutually orthogonal, and are not assumed to be uniform in magnitude or direction. We shall only assume that they are right-handed, so that the scalar triple product h1⸱ h2× h3 , abbreviated [h1h2h3], is positive.

[To be continued.]

Long live del[edit | edit source]

Down with del-dot and del-cross[edit | edit source]

Identities made simple[edit | edit source]

Conclusion[edit | edit source]

Additional information[edit | edit source]

Acknowledgments[edit | edit source]

Competing interests[edit | edit source]

None.

Ethics statement[edit | edit source]

This article does not concern research on human or animal subjects.

TO DO:[edit | edit source]

  • Abstract
  • Keywords
  • Figure(s) & caption(s)
  • Etc.!

Notes[edit | edit source]

  1. The relegation of determinants was anticipated by C.G. Broyden (1975). But Broyden's approach is less radical: he does not deal with abstract vector spaces or abstract linear transformations, and his eventual definition of the determinant, unlike Axler's, is traditional—not a product of the preceding narrative.
  2. To show that a quantity defined in Cartesian coordinates is a vector, we show that its coordinate representation is contravariant, i.e. that it changes so as to compensate for the change in the coordinate system. Thus Feynman (1963, vol. 1, §11-5), having defined velocity from displacement in Cartesian coordinates, shows that velocity is a vector by showing that its coordinate representation contra-rotates (like that of displacement) if the coordinate system rotates.
    To show that an operator defined in Cartesian coordinates yields a vector (if it gives three components) or a scalar (if it gives a number), we show that its coordinate representation is covariant, i.e. that the representation of the operator in the coordinate system, with the operand(s) and the result in that system, retains its form as the system changes. Thus Feynman (1963, vol. 1, §11-7), having defined the magnitude and dot-product operators in Cartesian coordinates, shows that they are scalar operators by showing that their representations in rotated coordinates are the same (except for names of coordinates and components) as in the original coordinates. And Chen-To Tai (1995, pp. 40–42), having determined the form of the gradient operator in a general curvilinear orthogonal coordinate system, shows that it is a vector operator by showing that it has the same form in any other curvilinear orthogonal coordinate system.
  3. Though, as already noted, I detour into coordinates to show that the coordinate-free definition is unambiguous.
  4. Even if we claim that "particles" of matter are wave functions and therefore continuous, this still implies that matter is lumpy in a manner not normally contemplated by continuum mechanics.
  5. An alternative to the c-subscript notation is the Feynman subscript notation, in which the subscript is attached to the operator and indicates which factor is allowed to vary, so that (7.26) would be written
  6. Here I use the broad triangle symbol (△) rather than the narrower Greek Delta (Δ); the latter would more likely be misinterpreted as "change in ψ ".
  7. Though the above eq. (5g) is also occasionally called by that name.
  8. For better or worse, the gradient of a vector field is not considered in this article, because it is not a "vector" as understood in this article.
  9. Or nabla, because it allegedly looks like the ancient stringed instrument that the Greeks called by that name.

References[edit | edit source]

  1. Axler, 1995, §9.
  2. The latter passage, as it appears in the 5th edition (p. 397), is cited by Tai (1994, p. 6).
  3. Quoted by Tai (1994), in alphabetical order within each category. For Kovach he could have added p. 308.  Potter he misnames as Porter.
  4. Quoted by Tai (1994, p. 23).
  5. Wilson, 1907, p. 150, quoted by Tai (1994, p. 10).
  6. Borisenko & Tarapov, 1968, p. 169.
  7. The following derivation is a tightened-up version of the one in Borisenko & Tarapov, 1968, p. 180, quoted in Tai, 1995, p. 46; the equation numbers are Tai's.
  8. Gibbs, 1881, § 56.
  9. Katz, 1979, pp. 146–9.
  10. Gibbs, 1881, § 54, quoted in Tai, 1995, p. 17.
  11. CfKatz, 1979, pp. 149–50.
  12. CfGibbs, 1881, § 71, and Moon & Spencer, 1965, p. 235.; quoted by Tai, 1995, pp. 18, 43.
  13. Moon & Spencer, 1965, p. 236.
  14. Gibbs, 1881, § 54.
  15. Wilson, 1907, pp. 150, 152.

Bibliography[edit | edit source]

  • S.J. Axler, 1995, "Down with Determinants!"  American Mathematical Monthly, vol. 102, no. 2 (Feb. 1995), pp. 139–54; jstor.org/stable/2975348.  (Author's preprint, with different pagination: researchgate.net/publication/265273063_Down_with_Determinants.)
  • S.J. Axler, 2023, Linear Algebra Done Right, 4th Ed., Springer; linear.axler.net (open access).
  • A.I. Borisenko and I.E. Tarapov (tr. & ed. R.A. Silverman), 1968, Vector and Tensor Analysis with Applications, Prentice-Hall; reprinted New York: Dover, 1979, archive.org/details/vectortensoranal0000bori.
  • C.G. Broyden, 1975, Basic Matrices, London: Macmillan.
  • R.P. Feynman, R.B. Leighton, & M. Sands, 1963 etc., The Feynman Lectures on Physics, California Institute of Technology; feynmanlectures.caltech.edu.
  • J.W. Gibbs, 1881–84, "Elements of Vector Analysis", privately printed New Haven: Tuttle, Morehouse & Taylor, 1881 (§§ 1–101), 1884 (§§ 102–189, etc.), archive.org/details/elementsvectora00gibb; published in The Scientific Papers of J. Willard Gibbs (ed. H.A. Bumstead & R.G. Van Name), New York: Longmans, Green, & Co., 1906, vol. 2, archive.org/details/scientificpapers02gibbuoft, pp. 17–90.
  • V.J. Katz, 1979, "The history of Stokes' theorem", Mathematics Magazine, vol. 52, no. 3 (May 1979), pp. 146–56; jstor.org/stable/2690275.
  • E. Kreyszig, 1962 etc., Advanced Engineering Mathematics, New York: Wiley;  5th Ed., 1983;  6th Ed., 1988;  9th Ed., 2006;  10th Ed., 2011.
  • P.H. Moon and D.E. Spencer, 1965, Vectors, Princeton, NJ: Van Nostrand.
  • W.K.H. Panofsky and M. Phillips, 1962, Classical Electricity and Magnetism, 2nd Ed., Addison-Wesley; reprinted Mineola, NY: Dover, 2005.
  • C.-T. Tai, 1994, "A survey of the improper use of ∇ in vector analysis" (Technical Report RL 909), Dept. of Electrical Engineering & Computer Science, University of Michigan; hdl.handle.net/2027.42/7869.
  • C.-T. Tai, 1995, "A historical study of vector analysis" (Technical Report RL 915), Dept. of Electrical Engineering & Computer Science, University of Michigan; hdl.handle.net/2027.42/7868.
  • E.B. Wilson, 1907, Vector Analysis: A text-book for the use of students of mathematics and physics ("Founded upon the lectures of J. Willard Gibbs…"), 2nd Ed., New York: Charles Scribner's Sons; archive.org/details/vectoranalysisa01wilsgoog.