WikiJournal Preprints/Cut the coordinates! (or Vector Analysis Done Fast)
This article is an unpublished pre-print not yet undergoing peer review.
To submit this article for peer review, please:
Article information
Abstract
- The gradient is related to an acceleration through an equation of motion;
- The divergence is related to two time-derivatives of density (the partial derivative and the material derivative) through two forms of an equation of continuity;
- The component of the curl in a general direction is expressed as a divergence (now known to be unambiguous);
- The same is done for the general component of the gradient, yielding not only a second proof of unambiguity of the gradient, but also the relation between the gradient and the directional derivative; this together with the original definition of the Laplacian shows that the Laplacian of a scalar field is the divergence of the gradient and therefore unambiguous. The unambiguity of the Laplacian of a vector field then follows from a component argument (as for the curl) or from a linearity argument.
The derivation of the relation between the gradient and the directional derivative yields a coordinate-free definition of the dot-del operator for a scalar right-hand operand. But, as the directional derivative is also defined for a non-scalar operand, the same relation offers a method of generalizing the dot-del operator, so that the definition of the Laplacian of a general field can be rewritten with that operator. The advection operator—derived without coordinates, for both scalar and vector properties—is likewise rewritten.
Meanwhile comparison between the definitions of the various operators leads to coordinate-free definitions of the del-cross, del-dot, and del-squared operators. These together with the dot-del operator allow the four integral theorems to be condensed into a single generalized volume-integral theorem.
If the volume of integration is reduced to a thin curved slab of uniform thickness, with an edge-face perpendicular to the broad faces, the four integral theorems are reduced to their two-dimensional forms, each of which relates an integral over a surface segment to an integral around its enclosing curve, provided that the original closed-surface integral has no contribution from the broad faces of the slab. This proviso can be satisfied by construction in two of the four cases, yielding two general theorems, one of which is the Kelvin–Stokes theorem. By applying these two theorems to a segment of a closed surface, and expanding the segment to cover the entire surface, it is shown that the gradient is irrotational and the curl is solenoidal.
The next part of the exposition is more conventional, but still coordinate-free. The gradient theorem is derived from the relation between the gradient and the directional derivative. An irrotational field is shown to have a scalar potential. The 1/r scalar field is shown to be the field whose negative gradient is the inverse-square vector field, whose divergence is a delta function, which is therefore also the negative Laplacian of the 1/r scalar field. These results enable the construction of a field with a given divergence or a given Laplacian. The wave equation is derived from small-amplitude sound waves in a non-viscous fluid, and shown to be satisfied by a spherical-wave field with a 1/r amplitude, whose D'Alembertian is a delta function, enabling the construction of a wave function with a given D'Alembertian. But further progress, including the construction of a field with a given curl, seems to require the introduction of coordinates.
With the aid of identities already found, expressions are easily obtained for the gradient, curl, divergence, Laplacian, and advection operators in Cartesian coordinates—with indicial notation and implicit summation, for brevity. While the resulting expressions for the curl and divergence may look unfamiliar, they match the initial definitions given by J. Willard Gibbs. The Cartesian expressions are found convenient for deriving further identities: a comprehensive collection is derived, leading to the construction of a field with a given curl in a star-shaped region and, as a by-product, a demonstration that the curl of the velocity field of a rigid body is twice the angular velocity. The curl-of-the-curl identity leads to a second definition of the Laplacian of a vector, the Helmholtz decomposition, and the prediction of electromagnetic waves.
The time-honored method of deriving vector-analytic identities—treating the divergence and curl as "formal products" with the del operator, varying one field at a time, and adding the results—is found to be less than rigorous, sometimes less than clear, and hard to justify in view of the ease with which the same thing can be done with Cartesian coordinates, indicial notation, and implicit summation.
The introduction of general coordinates proceeds through (non-normalized) natural and dual basis vectors, reciprocity, the Kronecker delta, covariance of the natural basis, contravariance of the dual basis, contravariant and covariant components, local bases, contravariance of coordinates, covariance of derivatives w.r.t. coordinates, the Jacobian, and handedness. Reciprocity leads to the dot-product of two vector fields and, via the permutation symbol, to the cross-products of the basis vectors, the definition of one basis in terms of the other, the cross-product of two vector fields, and reciprocity of the covariant and contravariant Jacobians. Thus the stage is set for expressing operators in general coordinates.
The multivariate chain rule leads to expressions for the directional derivative (in terms of the contravariant basis), hence the gradient (del) and advection operators. The identity for the curl of the product of a scalar and a vector leads to an expression for the curl in terms of covariant components. Expressions for the curl and divergence operators are obtained from the original volume-based definitions, and are found to agree with del-cross and del-dot respectively, with del expressed in the same general coordinates. The volume-based definition of the divergence leads, by a simpler path, to an expression in terms of contravariant components, which in turn yields an expression for the Laplacian.
Affine coordinates are briefly described before proceeding to orthogonal coordinates. In the latter, the Jacobian is simplified and we can choose an orthonormal basis, which is its own reciprocal, so that vectors can be specified in components w.r.t. a single basis. By expressing the old basis vectors and components in terms of the new ones, we can re-express dot-products, cross-products, and differential operators in terms of orthogonal coordinates with an orthonormal basis.
In an appendix, Huygens' principle is mathematized by deriving Green's identities and thence Kirchhoff's integral theorem (without assuming sinusoidal time-dependence), and then interpreting Kirchhoff's integrand as a distribution of secondary sources. That distribution can be described in two ways. The standard description, in terms of monopoles and normal dipoles, is applicable to the general case. A new description, in terms of "generalized spatiotemporal dipoles" (GSTDs), is useful for the case of a single monopole primary source—but, unlike the original spatiotemporal-dipole formulation of D.A.B. Miller (1991), does not require the surface of integration to be a wavefront. The GSTD description clarifies the manner in which the secondary sources suppress backward secondary waves: the directivity of the GSTD sources is such as to suppress specular reflections off the surface of integration.
Introduction
[edit | edit source]Sheldon Axler, in his essay "Down with determinants!" (1995) and his ensuing book Linear Algebra Done Right (4th Ed., 2023–), does not entirely eliminate determinants, but introduces them as late as possible and then exploits them for what he calls their "main reasonable use in undergraduate mathematics", namely the change-of-variables formula for multiple integrals.[1] Here I treat coordinates in vector analysis somewhat as Axler treats determinants in linear algebra: I introduce coordinates as late as possible, and then exploit them in unconventionally rigorous derivations of vector-analytic identities from (e.g.) vector-algebraic identities. But I contrast with Axler in at least two ways. First, as my subtitle suggests, I have no intention of expanding my paper into a book. Brevity is of the essence. Second, while one may well avoid determinants in numerical linear algebra,[2] one can hardly avoid coordinates in numerical vector analysis! So I cannot extend the coordinate-minimizing path into computation. But I can extend it up to the threshold by expressing the operators of vector analysis in general coordinates and orthogonal coordinates, leaving it for others to specialize the coordinates and compute with them. On the way, I can satisfy readers who need the concepts of vector analysis for theoretical purposes, and who would rather read a paper than a book. Readers who stay to the end of the paper will get a more general treatment of coordinates than is offered by a typical book-length introduction to vector analysis. In the meantime, coordinates won't needlessly get in the way.
The cost of coordinates
[edit | edit source]Mathematicians define a "vector" as a member of a vector space, which is a set whose members satisfy certain basic rules of algebra (called the vector-space axioms) with respect to another set called a field (e.g., the real numbers), which has its own basic rules of algebra (the field axioms), and whose members are called "scalars". Physicists are more fussy. They typically want a "vector" to be not only a member of a vector space, but also a first-order tensor : a "tensor", meaning that it exists independently of any coordinate system with which it might be specified; and "first-order" (or "first-degree", or "first-rank"), meaning that it is specified by a one-dimensional array of numbers. Similarly, a 2nd-order tensor is specified by a 2-dimensional array (a matrix), and a 3rd-order by a 3-dimensional array, and so on. Hence they want a "scalar", which is specified by a single number (a zero-dimensional array), to be a zero-order tensor. In "vector analysis", we are greatly interested in applications to physical situations, and accordingly take the physicists' view on what constitutes a vector or a scalar.
So, for our purposes, defining a quantity by three components in (say) a Cartesian coordinate system is not enough to make it a vector, and defining a quantity as a real function of a list of coordinates is not enough to make it a scalar, because we still need to show that the quantity has an independent existence. One way to do this is to show that its coordinate representation behaves appropriately when the coordinate system is changed. Independent existence of a quantity means that its coordinate representation changes so as to compensate for the change in the coordinate system.[3] But independent existence of an operator means that its expression in one coordinate system (with the operand[s] and the result in that system) gives the same result as the corresponding expression in another coordinate system.[4]
Here we circumvent these complications by the most obvious route: by initially defining things without coordinates. If, having defined something without coordinates, we then need to represent it with coordinates, we can choose the coordinate system for convenience rather than generality.
The limitations of limits
[edit | edit source]In the branch of pure mathematics known as analysis, there is a thing called a limit, whereby for every positive ϵ there exists a positive δ such that if some increment is less than δ, some error is less than ϵ. In the branch of applied mathematics known as continuum mechanics, there is a thing called reality, whereby if the increment is less than some positive δ, the assumption of a continuum becomes ridiculous, so that the error cannot be made less than an arbitrary ϵ. Yet vector "analysis" (together with higher-order tensors) is typically studied with the intention of applying it to some form of "continuum" mechanics, such as the modeling of elasticity, plasticity, fluid flow, or (widening the net) electrodynamics of ordinary matter; in short, it is studied with the intention of conveniently forgetting that, on a sufficiently small scale, matter is lumpy.[a] One might therefore submit that to express the principles of vector analysis in the language of limits is to strain at a gnat and swallow a camel. Here I avoid that camel by referring to elements of length or area or volume, each of which is small enough to allow some quantity or quantities to be considered uniform within it, but, for the same reason, large enough to allow such local averaging of the said quantity or quantities as is necessary to tune out the lumpiness.
We shall see bigger camels, where well-known authors define or misdefine a vector operator and then derive identities by treating it like an ordinary vector quantity. These I also avoid.
Prerequisites
[edit | edit source]I assume that the reader is familiar with the algebra and geometry of vectors in 3D space, including the dot-product, the cross-product, and the scalar triple product, their geometric meanings, their expressions in Cartesian coordinates, and the identity
- a × (b × c) = a⸱ c b − a⸱ b c ,
which we call the "expansion" of the vector triple product.[5] I further assume that the reader can generalize the concept of a derivative, so as to differentiate a vector with respect to a scalar, e.g.
or so as to differentiate a function of several independent variables "partially" w.r.t. one of them while the others are held constant, e.g.
But in view of the above remarks on limits, I also expect the reader to be tolerant of an argument like this: In a short time dt, let the vectors r and p change by dr and dp respectively. Then
where, as always, the orders of the cross-products matter.[b] Differentiation of a dot-product behaves similarly, except that the orders don't matter; and if p = mv, where m is a scalar and v is a vector, then
Or an argument like this: If, then
that is, we can switch the order of differentiation in a "mixed" partial derivative. If ∂x is an abbreviation for ∂/∂x , etc., this rule can be written in operational terms as
- ∂x ∂y = ∂y ∂x .
More generally, if ∂i is an abbreviation for ∂/∂xi where i ∊ {1, 2,…}, the rule becomes
- ∂i ∂j = ∂j ∂i .
These generalizations of differentiation, however, do not go beyond differentiation w.r.t. real variables, some of which are scalars, and some of which are coordinates. Vector analysis involves quantities that may be loosely described as derivatives w.r.t. a vector—usually the position vector.
Closed-surface integrals per unit volume
[edit | edit source]The term field, mentioned above in the context of algebraic axioms, has an alternative meaning: if r is the position vector, a scalar field is a scalar-valued function of r, and a vector field is a vector-valued function of r; both may also depend on time. These are the functions of which we want "derivatives" w.r.t. the vector r.
In this section I introduce four such derivatives—the gradient, the curl, the divergence, and the Laplacian —in a way that will seem unremarkable to those readers who aren't already familiar with them, but idiosyncratic to those who are. The gradient is commonly introduced in connection with a curve and its endpoints, the curl in connection with a surface segment and its enclosing curve, the divergence in connection with a volume and its enclosing surface, and the Laplacian as a composite of two of the above, initially applicable only to a scalar field. Here I introduce all four in connection with a volume and its enclosing surface; and I introduce the Laplacian as a concept in its own right, equally applicable to a scalar or vector field, and only later relate it to the others. My initial definitions of the gradient, the curl, and the Laplacian, although not novel, are usually thought to be more advanced than the common ones—in spite of being conceptually simpler, and in spite of being obvious variations on the same theme.
Instant integral theorems (with a caveat)
[edit | edit source]Let V be a volume (3D region) enclosed by a surface S (a mathematical surface, not generally a physical barrier). Let n̂ be the unit normal vector at a general point on S, pointing out of V. Let n be the distance from S in the direction of n̂ (positive outside V, negative inside), and let ∂n be an abbreviation for ∂/∂n , where the derivative—commonly called the normal derivative—is tacitly assumed to exist.
In V, and on S, let p be a scalar field (e.g., pressure in a fluid, or temperature), and let q be a vector field (e.g., flow velocity, or heat-flow density), and let ψ be a generic field which may be a scalar or a vector. Let a general element of the surface S have area dS, and let it be small enough to allow n̂, p, q, and ∂n ψ to be considered uniform over the element. Then, for every element, the following four products are well defined:
-
(
)
If p is pressure in a non-viscous fluid, the first of these products is the force exerted by the fluid in V through the area dS. The second product does not have such an obvious physical interpretation; but if q is circulating clockwise about an axis directed through V, the cross-product will be exactly tangential to S and will tend to have a component in the direction of that axis. The third product is the flux of q through the surface element; if q is flow velocity, the third product is the volumetric flow rate (volume per unit time) out of V through dS ; or if q is heat-flow density, the third product is the heat transfer rate (energy per unit time) out of V through dS. The fourth product, by analogy with the third, might be called the flux of the normal derivative of ψ through the surface element, but is equally well defined whether ψ is a scalar or a vector—or, for that matter, a matrix, or a tensor of any order, or anything else that we can differentiate w.r.t. n.
If we add up each of the four products over all the elements of the surface S, we obtain, respectively, the four surface integrals
-
(
)
in which the double integral sign indicates that the range of integration is two-dimensional. The first surface integral takes a scalar field and yields a vector; the second takes a vector field and yields a vector; the third takes a vector field and yields a scalar; and the fourth takes (e.g.) a scalar field yielding a scalar, or a vector field yielding a vector. If p is pressure in a non-viscous fluid, the first integral is the force exerted by the fluid in V on the fluid outside V. The second integral may be called the skew surface integral of q over S ,[6] or, for the reason hinted above, the circulation of q over S. The third integral, commonly called the flux integral (or simply the surface integral) of q over S, is the total flux of q out of V. And the fourth integral is the surface integral of the outward normal derivative of ψ.
Let the volume V be divided into elements. Let a general volume element have the volume dV and be enclosed by the surface δS —not to be confused with the area dS of a surface element, which may be an element of S or of δS. Then consider what happens if, instead of evaluating each of the above surface integrals over S, we evaluate it over each δS and add up the results for all the volume elements. In the interior of V, each surface element of area dS is on the boundary between two volume elements, for which the unit normals n̂ at dS, and the respective values of ∂n ψ, are equal and opposite. Hence when we add up the integrals over the surfaces δS, the contributions from the elements dS cancel in pairs, except on the original surface S, so that we are left with the original integral over S. So, for the four surface integrals in (2), we have respectively
-
(
)
Now comes a big "if": if we define the gradient of p (pronounced "grad p") inside dV as
-
(
)
and the curl of q inside dV as
-
(
)
and the divergence of q inside dV as
-
(
)
and the Laplacian of ψ inside dV as [c]
-
(
)
(where the letters after the equation number stand for gradient, curl, divergence, and Laplacian, respectively), then equations (3) can be rewritten
But because each term in each sum has a factor dV, we call the sum an integral; and because the range of integration is three-dimensional, we use a triple integral sign. Thus we obtain the following four theorems relating integrals over an enclosing surface S to integrals over the enclosed volume V :
-
(
)
-
(
)
-
(
)
-
(
)
Of the above four results, only the third (5d) seems to have a standard name; it is called the divergence theorem (or Gauss's theorem or, more properly, Ostrogradsky's theorem[7]), and is indeed the best known of the four—although the other three, having been derived in parallel with it, may be said to be equally fundamental.
As each of the operators ∇, curl, and div calls for an integration w.r.t. area and then a division by volume, the dimension (or unit of measurement) of the result is the dimension of the operand divided by the dimension of length, as if the operation were some sort of differentiation w.r.t. position. Moreover, in each of equations (5g) to (5d), there is a triple integral on the right but only a double integral on the left, so that each of the operators ∇, curl, and div appears to compensate for a single integration. For these reasons, and for convenience, we shall describe them as differential operators. By comparison, the △ operator in (4L) or (5L) calls for a further differentiation w.r.t. n ; we shall therefore describe △ as a 2nd-order differential operator. (An additional reason for these descriptions will emerge later.) As promised, the four definitions (4g) to (4L) are "obvious variations on the same theme" (although the fourth is somewhat less obvious than the others).
But remember the "if": Theorems (5g) to (5L) depend on definitions (4g) to (4L) and are therefore only as definite as those definitions! Equations (3), without assuming anything about the shapes and relative sizes of the closed surfaces δS (except, tacitly, that n̂ is piecewise well-defined), indicate that the surface integrals are additive with respect to volume. But this additivity, by itself, does not guarantee that the surface integrals are shared among neighboring volume elements in proportion to their volumes, as envisaged by "definitions" (4g) to (4L). Each of these "definitions" is unambiguous if, and only if, the ratio of the surface integral to dV is insensitive to the shape and size of δS for a sufficiently small δS. Notice that the issue here is not whether the ratios specified in equations (4g) to (4L) are true vectors or scalars, independent of the coordinates; all of the operations needed in those equations have coordinate-free definitions. Rather, the issue is whether the resulting ratios are unambiguous notwithstanding the ambiguity of δS, provided only that δS is sufficiently small. That is the advertised "caveat", which must now be addressed.
In accordance with our "applied" mathematical purpose, our proofs of the unambiguity of the differential operators will rest on a few thought experiments, each of which applies an operator to a physical field, say f, and obtains another physical field whose unambiguity is beyond dispute. The conclusion of the thought experiment is then applicable to any operand field whose mathematical properties are consistent with its interpretation as the physical field f ; the loss of generality, if any, is only what is incurred by that interpretation.
Unambiguity of the gradient
[edit | edit source]Suppose that a fluid with density ρ (a scalar field) flows with velocity v (a vector field) under the influence of the internal pressure p (a scalar field). Then the integral in (4g) is the force exerted by the pressure of the fluid inside δS on the fluid outside, so that minus the integral is the force exerted on the fluid inside δS by the pressure of the fluid outside. Dividing by dV, we find that −∇p, as defined by (4g), is the force per unit volume, due to the pressure outside the volume.[8] If this is the only force per unit volume acting on the volume (e.g., because the fluid is non-viscous and in a weightless environment, and the volume element is not in contact with the container), then it is equal to the acceleration times the mass per unit volume; that is,
-
(
)
Now provided that the left side of this equation is locally continuous, it can be considered uniform inside the small δS, so that the left side is unambiguous, whence ∇p is also unambiguous. If there are additional forces on the fluid element, e.g. due to gravity and⧸or viscosity, then −∇p is not the sole contribution to density-times-acceleration, but is still the contribution due to pressure, which is still unambiguous.
By showing the unambiguity of definition (4g), we have confirmed theorem (5g). In the process we have seen that the volume-based definition of the gradient is useful for the modeling of fluids, and intuitive in that it formalizes the common notion that a pressure "gradient" gives rise to a force.
Unambiguity of the divergence
[edit | edit source]In the aforesaid fluid, in a short time dt, the volume that flows out of fixed closed surface δS through a fixed surface element of area dS is v dt ⸱ n̂ dS (i.e., the displacement normal to the surface element, times the area). Multiplying this by density and integrating over δS, we find that the mass flowing out of δS in time dt is . Dividing this by dV, and then by dt, we get the rate of reduction of density inside δS ; that is,
where the derivative w.r.t. time is evaluated at a fixed location (because δS is fixed), and is therefore written as a partial derivative (because other variables on which ρ might depend—namely the coordinates—are held constant). Provided that the right-hand side is locally continuous, it can be considered uniform inside δS and is therefore unambiguous, so that the left side is likewise unambiguous. But the left side is simply div ρv as defined by (4d),[d] which is therefore also unambiguous,[9] confirming theorem (5d). In short, the divergence operator is that which maps ρv to the rate of reduction of density at a fixed point:
-
(
)
This result, which expresses conservation of mass, is a form of the so-called equation of continuity.
The partial derivative ∂ρ/∂t in (7d) must be distinguished from the material derivative dρ/dt , which is evaluated at a point that moves with the fluid.[e] [Similarly, d v/dt in (6g) is the material acceleration, because it is the acceleration of the mobile mass—not of a fixed point! ] To re-derive the equation of continuity in terms of the material derivative, the volume v dt ⸱ n̂ dS , which flows out through dS in time dt (as above), is integrated over δS to obtain the increase in volume of the mass initially contained in dV. Dividing this by the mass, ρ dV, gives the increase in specific volume (1⧸ρ) of that mass, and then dividing by dt gives the rate of change of specific volume; that is,
Multiplying by ρ² and comparing the left side with (4d), we obtain
-
(
)
Whereas (7d) shows that div ρv is unambiguous, (7d') shows that div v is unambiguous (provided that the right-hand sides are locally continuous). In accordance with the everyday meaning of "divergence", (7d') also shows that div v is positive if the fluid is expanding (ρ decreasing), negative if it is contracting (ρ increasing), and zero if it is incompressible. In the last case, the equation of continuity reduces to
-
[ for an incompressible fluid ].
(
)
For incompressible flow, any tubular surface tangential to the flow velocity, and consequently with no flow in or out of the "tube", has the same volumetric flow rate across all cross-sections of the "tube", as if the surface were the wall of a pipe full of liquid (except that the surface is not necessarily stationary). Accordingly, a vector field with zero divergence is described as solenoidal (from the Greek word for "pipe"). More generally, a solenoidal vector field has the property that for any tubular surface tangential to the field, the flux integrals across any two cross-sections of the "tube" are the same—because otherwise there would be a net flux integral out of the closed surface comprising the two cross-sections and any segment of tube between them, in which case, by the divergence theorem (5d), the divergence would have to be non-zero somewhere inside, contrary to (7i).
Unambiguity of the curl (and gradient)
[edit | edit source]The unambiguity of the curl (4c) follows from the unambiguity of the divergence. Taking dot-products of (4c) with an arbitrary constant vector b, we get
that is, by (4d),
-
[ for uniform b].
(
)
(The parentheses around q × b on the right, although helpful because of the spacing, are not strictly necessary, because the alternative binding would be (div q), which is a scalar, whose cross-product with the vector b is not defined. And the left-hand expression does not need parentheses, because it can only mean the dot-product of a curl with the vector b; it cannot mean the curl of a dot-product, because the curl of a scalar field is not defined.) This result (8c) is an identity if the vector b is independent of location, so that it can be taken inside or outside the surface integral; thus b may be a uniform vector field, and may be time-dependent. If we make b a unit vector, the left side of the identity is the (scalar) component of curl q in the direction of b, and the right side is unambiguous. Thus the curl is unambiguous because its component in any direction is unambiguous. This confirms theorem (5c).
Similarly, the unambiguity of the divergence implies the unambiguity of the gradient. Starting with (4g), taking dot-products with an arbitrary uniform vector b, and proceeding as above, we obtain
-
[ for uniform b].
(
)
(The left-hand side does not need parentheses, because it can only mean the dot-product of a gradient with the vector b; it cannot mean the gradient of the dot-product of a scalar field with a vector field, because that dot-product would not be defined.) If we make b a unit vector, this result (8g) says that the (scalar) component of ∇p in the direction of b is given by the right-hand side, which again is unambiguous. So here we have a second explanation of the unambiguity of the gradient: like the curl, it is unambiguous because its component in any direction is unambiguous.
We might well ask what happens if we take cross-products with b on the left, instead of dot-products. If we start with (4g), the process is straightforward: in the end we can switch the order of the cross-product on the left, and change the sign on the right, obtaining
-
[ for uniform b].
(
)
(Again no parentheses are needed.) If we start with (4c) instead, and take b inside the integral, we get a vector triple product to expand, which leads to
in which the first term on the right is simply ∇ b⸱q (the gradient of the dot-product). The second term is more problematic. If we had a scalar p instead of the vector q, we could take b outside the second integral, so that the second term would be (minus) b ⸱ ∇p. This suggests that the actual second term should be (minus) b ⸱ ∇q. Shall we therefore adopt the second term (without the sign) as the definition of b⸱∇ q for a vector q (treating b⸱∇ as an operator), and write
-
[ for uniform b] ?
(
)
The proposal would be open to the objection that b⸱∇ q had been defined only for uniform b , whereas b ⸱ ∇p (for scalar p) is defined whether b is uniform or not. So, for the moment, let us put (8q) aside and run with (8c), (8g), and (8p).
Another meaning of the gradient
[edit | edit source]Let ŝ be a unit vector in a given direction, and let s be a parameter measuring distance (arc length) along a path in that direction. By equation (8g) and definition (4d), we have
where, by the unambiguity of the divergence, the shape of the closed surface δS enclosing dV can be chosen for convenience. So let δS be a right cylinder with cross-sectional area α and perpendicular height ds , with the path passing perpendicularly through the end-faces at parameter-values s and s+ds , where the outward unit normal n̂ consequently takes the values −ŝ and ŝ , respectively. And let the cross-sectional dimensions be small compared with ds so that the values of p at the end-faces, say p and p+dp, can be taken to be the same as where the end-faces cut the path. Then dV = α ds , and the surface integral over δS includes only the contributions from the end-faces (because n̂ is perpendicular to ŝ elsewhere); those contributions are respectively and i.e. and . With these substitutions the above equation becomes
that is,
-
(
)
where the right-hand side, commonly called the directional derivative of p in the ŝ direction,[10] is the derivative of p w.r.t. distance in that direction. Although (9g) has been obtained by taking that direction as fixed, the equality is evidently maintained if s measures arc length along any path tangential to ŝ at the point of interest.
Equation (9g) is an alternative definition of the gradient: it says that the gradient of is the vector whose scalar component in any direction is the directional derivative of in that direction. For real, this component has its maximum, namely |∇p| , in the direction of ∇p ; thus the gradient of is the vector whose direction is that in which the derivative of w.r.t. distance is a maximum, and whose magnitude is that maximum. This is the usual conceptual definition of the gradient.[11] Sometimes it is convenient to work directly from this definition. For example, in Cartesian coordinates (x, y, z), if a scalar field is given by x , its gradient is obviously the unit vector in the direction of the x axis, usually called i ; that is, ∇x = i. Similarly, if r = r r ̂ is the position vector, then ∇r = r ̂.
If ŝ is tangential to a level surface of p (a surface of constant p), then ∂s p in that direction is zero, in which case (9g) says that ∇p (if not zero) is orthogonal to ŝ. So is orthogonal to the surfaces of constant (as we would expect, having just shown that the direction of ∇p is that in which p varies most steeply). This result leads to a method of finding a vector normal to a curved surface at a given point: if the equation of the surface is f (r) = C , where r is the position vector and C is a constant (possibly zero), a suitable vector is ∇f evaluated at the given point.
If p is uniform —that is, if it has no spatial variation—then its derivative w.r.t. distance in every direction is zero; that is, the component of ∇p in every direction is zero, so that ∇p must be the zero vector. In short, the gradient of a uniform scalar field is zero. Conversely, if p is not uniform, there must be some location and some direction in which its derivative w.r.t. distance, if defined at all, is non-zero, so that its gradient, if defined at all, is also non-zero. Thus a scalar field with zero gradient in some region is uniform in that region.
Unambiguity of the Laplacian
[edit | edit source]Armed with our new definition of the gradient (9g), we can revisit our definition of the Laplacian (4L). If ψ is a scalar field, then, by (9g), can be replaced by in (4L), which then becomes
-
(
)
that is, by definition (4d),
-
[ for scalar ψ].
(
)
So the Laplacian of a scalar field is the divergence of the gradient. This is the usual introductory definition of the Laplacian—and on its face is applicable only in the case of a scalar field. The unambiguity of the Laplacian, in this case, follows from the unambiguity of the divergence and the gradient.
If, on the contrary, ψ in definition (4L) is a vector field, then we can again take dot-products with a uniform vector b, obtaining
If we make b a unit vector, this says that the scalar component of the Laplacian of a vector field, in any direction, is the Laplacian of the scalar component of that vector field in that direction. As we have just established that the latter is unambiguous, so is the former.
But the unambiguity of the Laplacian can be generalized further. If
where each is a scalar field, and each αi is a constant, and the counter i ranges from (say) 1 to k , then it is clear from (4L) that
-
(
)
In words, this says that the Laplacian of a linear combination of fields is the same linear combination of the Laplacians of the same fields—or, more concisely, that the Laplacian is linear. I say "it is clear" because the Laplacian as defined by (4L) is itself a linear combination, so that (10) merely asserts that we can regroup the terms of a nested linear combination; the gradient, curl, and divergence as defined by (4g) to (4d) are likewise linear. It follows from (10) that the Laplacian of a linear combination of fields is unambiguous if the Laplacians of the separate fields are unambiguous. Now we have supposed that the fields are scalar and that the coefficients αi are constants. But the same logic applies if the "constants" are uniform basis vectors (e.g., i, j,k), so that the "linear combination" can represent any vector field, whence the Laplacian of any vector field is unambiguous. And the same logic applies if the "constants" are chosen as a "basis" for a space of tensors of any order, so that the Laplacian of any tensor field of that order is unambiguous, and so on. In short, the Laplacian of any field that we can express with a uniform basis is unambiguous.
The dot-del, del-cross, and del-dot operators
[edit | edit source]The gradient operator ∇ is also called del.[f] If it simply denotes the gradient, we tend to pronounce it "grad" in order to emphasize the result. But it can also appear in combination with other operators to give other results, and in those contexts we tend to pronounce it "del".
One such combination is "dot del"— as in " b⸱∇ ", which we proposed for (8q), but did not quite manage to define satisfactorily for a vector operand. With our new definition of the gradient (9g), we can now make a second attempt. A general vector field q can be written |q| q̂ , so that
If ψ is a scalar field, we can apply (9g) to the right-hand side, obtaining
where sq is distance in the direction of q. For scalar ψ, this result is an identity between previously defined quantities. For non-scalar ψ, we have not yet defined the left-hand side, but the right-hand side is still well-defined and self-explanatory (provided that we can differentiate ψ w.r.t. sq). So we are free to adopt
-
(
)
where sq is distance in the direction of q , as the general definition of the operator q⸱∇ , and to interpret it as defining both a unary operator q⸱∇ which operates on a generic field, and a binary operator ⸱∇ which takes a (possibly uniform) vector field on the left and a generic field on the right.
For any vector field q , it follows from (11) that if is a uniform field, then.
For the special case in which q is a unit vector ŝ , with s measuring distance in the direction of ŝ , definition (11) reduces to
-
(
)
which agrees with (9g) but now holds for a generic field ψ [whereas (9g) was for a scalar field, and was derived as a theorem based on earlier definitions]. So ŝ⸱∇ , with a unit vector s , is the directional-derivative operator on a generic field; and by (11), q⸱∇ is a scaled directional derivative operator on a generic field.
In particular, if ŝ = n̂ we have
which we may substitute into the original definition of the Laplacian (4L) to obtain
-
(
)
which is just (9L) again, except that it now holds for for a generic field.
If our general definition of the gradient (4g) is also taken as the general definition of the ∇ operator,[12] then, comparing (4g) with (4c), (4d), and (13L), we see that
where the parentheses may seem to be required on account of the closing dS in (4g).[13] But if we write the factor dS before the integrand, the del operator in (4g) becomes
—if we insist that it is to be read as a operator looking for an operand, and not as a self-contained expression. Then, if we similarly bring forward the dS in (4c), (4d), and (13L), the respective operators become[14]
-
(
)
(pronounced "del cross", "del dot", and "del dot del"), of which the last is usually abbreviated as ∇2 ("del squared").[15] These notations are ubiquitous.
Another way to obtain the ∇ × and ∇⸱ operators (but not ∇2), again inspired by (4g), is to define
-
(
)
where T is any well-defined function that takes a vector argument. Setting T (∇) to ∇p , ∇ × q , and ∇⸱ q in (14s), we obtain respectively ∇p , curl q , and div q as given by (4g) to (4d). But this approach has undesirable side-effects—for example, that ∇p becomes synonymous with p∇. Accordingly, Chen-To Tai,[16] on the left of (14s), replaces ∇ with his original symbol which he calls the "symbolic operator" or the "S -operator" or, later, the "symbolic vector" or the "dummy vector". Tai in his later works (e.g., 1994, 1995) does not tolerate cross- or dot-products involving the del operator, but does tolerate such products involving his symbolic vector (1995, pp. 50–52).
There is a misconception that the operational equivalences in (14) apply only in Cartesian coordinates.[17] Tai does not accept them even in that case. But, because these equivalences have been derived from coordinate-free definitions of the operators, they must remain valid in any coordinate system provided that they are expressed correctly—without (e.g.) inadvertently taking dependent variables inside or outside differentiations.[18] That does not mean that they are always convenient, or easily verified, or conducive to the avoidance of error. But they sometimes make useful mnemonics; e.g., they let us rewrite identities (8c), (8g), and (8p) as
-
for uniform b.
(
)
These would be basic algebraic vector identities if ∇ were an ordinary vector, and one could try to derive them from the "algebraic" behavior of ∇; but they're not, because it isn't, so we didn't ! Moreover, these simple "algebraic" rules are for a uniform b, and do not of themselves tell us what to do if b is spatially variable; for example, (8g) is not applicable to (7d).
The advection operator
[edit | edit source]Variation or transportation of a property of a medium due to motion with the medium is called advection (which, according to its Latin roots, means "carrying to"). Suppose that a medium (possibly a fluid) moves with a velocity field v in some inertial reference frame. Let ψ be a field (possibly a scalar field or a vector field) expressing some property of the medium (e.g., density, or acceleration, or stress,[g]… or even v itself). We have seen that the time-derivative of ψ may be specified in two different ways: as the partial derivative ∂ψ/∂t , evaluated at a fixed point (in the chosen reference frame), or as the material derivative dψ/dt , evaluated at a point moving at velocity v (i.e., with the medium). The difference dψ/dt − ∂ψ/∂t is due to motion with the medium. To find another expression for this difference, let s be a parameter measuring distance along the path traveled by a particle of the medium. Then, for points along the path, the surface-plot of the small change in ψ (or any component thereof) as a function of small changes in t and s (plotted on perpendicular axes) can be taken as a plane through the origin, so that
that is, the change in ψ is the sum of the changes due to the change in t and the change in s . Dividing by dt gives
i.e.,
(and the first term on the right could have been written ∂t ψ). So the second term on the right is the contribution to the material derivative due to motion with the medium; it is called the advective term, and is non-zero wherever a particle of the medium moves along a path on which ψ varies with location—even if ψ at each location is constant over time. So the operator |v| ∂s , where s measures distance along the path, is the advection operator : it maps a property of a medium to the advective term in the time-derivative of that property. If ψ is v itself, the above result becomes
where the left-hand side (the material acceleration) is as given by Newton's second law, and the first term on the right (which we might call the "partial" acceleration) is the time-derivative of velocity in the chosen reference frame, and the second term on the right (the advective term) is the correction that must be added to the "partial" acceleration in order to obtain the material acceleration. This term is non-zero wherever velocity is non-zero and varies along a path, even if the velocity at each point on the path is constant over time (as when water speeds up while flowing at a constant volumetric rate into a nozzle). Paradoxically, while the material acceleration and the "partial" acceleration are apparently linear (first-degree) in v, their difference (the advective term) is not. Thus the distinction between ∂ψ/∂t and dψ/dt has the far-reaching implication that fluid dynamics is non-linear.
Applying (11) to the last two equations, we obtain respectively
-
(
)
and
-
(
)
where, in each case, the second term on the right is the advective term. So the advection operator can also be written v⸱∇ .
When the generic ψ in (16) is replaced by the density ρ , we get a relation between ∂ρ/∂t and dρ/dt , both of which we have seen before—in equations (7d) and (7d') above. Substituting from those equations then gives
-
(
)
where ∇ρ can be taken as a gradient since ρ is scalar. This result is in fact an identity—a product rule for the divergence—as we shall eventually confirm by another method.
Generalized volume-integral theorem
[edit | edit source]We can rewrite the fourth integral theorem (5L) in the "dot del" notation as
-
(
)
Then, using notations (14), we can condense all four integral theorems (5g), (5c), (5d), and (18L) into the single equation
-
(
)
where the wildcard ∗ (conveniently pronounced "star") is a generic binary operator which may be replaced by a null (direct juxtaposition of the operands) for theorem (5g), or a cross for (5c), or a dot for (5d), or ⸱∇ for (18L). This single equation is a generalized volume-integral theorem, relating an integral over a volume to an integral over its enclosing surface.[19]
Theorem (19) is based on the following definitions, which have been found unambiguous:
- the gradient of a scalar field p is the closed-surface integral of n̂ p per unit volume, where n̂ is the outward unit normal;
- the curl of a vector field is the skew surface integral per unit volume, also called the surface circulation per unit volume;
- the divergence of a vector field is the outward flux integral per unit volume; and
- the Laplacian is the closed-surface integral of the outward normal derivative, per unit volume.
The gradient maps a scalar field to a vector field; the curl maps a vector field to a vector field; the divergence maps a vector field to a scalar field; and the Laplacian maps a scalar field to a scalar field, or a vector field to a vector field, etc.
The gradient of p, as defined above, has been shown to be also
- the vector whose (scalar) component in any direction is the directional derivative of p in that direction (i.e. the derivative of p w.r.t. distance in that direction), and
- the vector whose direction is that in which the directional derivative of p is a maximum, and whose magnitude is that maximum.
Consistent with these alternative definitions of the gradient, we have defined the ⸱∇ operator so that ŝ⸱∇ (for a unit vector ŝ) is the operator yielding the directional derivative in the direction of ŝ , and we have used that notation to bring theorem (5L) under theorem (19).
So far, we have said comparatively little about the curl. That imbalance will now be rectified.
Closed-circuit integrals per unit area
[edit | edit source]Instant integral theorems (on a condition)
[edit | edit source]Theorems (5g) to (5L) are three-dimensional: each of them relates an integral over a volume V to an integral over its enclosing surface S. We now seek analogous two-dimensional theorems, each of which relates an integral over a surface segment to an integral around its enclosing curve. For maximum generality, the surface segment should be allowed to be curved into a third dimension.[h] Theorems of this kind can be obtained as special cases of theorems (5g) to (5L) by suitably choosing V and S ; this is another advantage of our "volume first" approach.
Let Σ be a surface segment enclosed by a curve C (a circuit or closed contour), and let l be a parameter measuring arc length around C , so that a general element of C has length dl ; and let a general element of the surface Σ have area dΣ. Let be the unit normal vector at a general point on Σ , and let t ̂ be the unit tangent vector to C at a general point on C in the direction of increasing l. In the original case of a surface enclosing a volume, we had to decide whether the unit normal pointed into or out of the volume (we chose the latter). In the present case of a circuit enclosing a surface segment, we have to decide whether l is measured clockwise or counterclockwise as seen when looking in the direction of the unit normal, and we choose clockwise. So l is measured clockwise about and C is traversed clockwise about.
From Σ we can construct obvious candidates for V and S. From every point on Σ , erect a perpendicular with a uniform small height h in the direction of. Then simply let V be the volume occupied by all the perpendiculars, and let S be its enclosing surface. Thus V is a (generally curved) thin slab of uniform thickness h, whose enclosing surface S consists of two close parallel (generally curved) broad faces connected by a perpendicular edge-face of uniform height h ; and we can treat as a vector field by extrapolating it perpendicularly from Σ. If we can arrange for h to cancel out, the volume V will serve as a 3D representation of the surface segment Σ while the edge-face will serve as a 2D representation of the curve C , so that our four theorems will relate an integral around C to an integral over Σ provided that there is no contribution from the broad faces to the integral over S. For brevity, let us call this proviso the 2D condition.
If the 2D condition is satisfied, an integral over the new S reduces to an integral over the edge-face, on which
so that the cancellation of h will leave an integral over C w.r.t. length. Meanwhile, in an integral over the new V, regardless of the 2D condition, we have
so that the cancellation of h will leave an integral over Σ w.r.t. area. So, substituting for dS and dV in (5g) to (5L), and canceling h as planned, we obtain respectively
-
(
)
-
(
)
-
(
)
-
(
)
all subject to the 2D condition. In each equation, the circle on the left integral sign acknowledges that the integral is around a closed loop. The unit vector n̂ , which was normal to the edge-face, is now normal to both t ̂ and; that is, n̂ is tangential to the surface segment Σ and projects perpendicularly outward from its bounding curve.
On the left side of (20g), the 2D condition is satisfied if (but not only if) n̂p takes equal-and-opposite values at any two opposing points on opposing broad faces of S , i.e. if p takes the same value at such points, i.e. if p has a zero directional derivative normal to Σ , i.e. if ∇p has no component normal to Σ. Thus a sufficient "2D condition" for (20g) is the obvious one.
Skipping forward to (20L), we see that the 2D condition is satisfied if takes equal-and-opposite values at any two opposing points on opposing broad faces of S , i.e. if (where measures distance in the direction of) takes the same value at such points, i.e. if.
For (20c) and (20d), the 2D constraint can be satisfied by construction, with more useful results—as explained under the next two headings. To facilitate this process, we first make a minor adjustment to Σ and C. Noting that any curved surface segment can be approximated to any desired accuracy by a polyhedral surface enclosed by a polygon, we shall indeed consider Σ to be a polyhedral surface made up of small planar elements, dΣ being the area of a general element, and we shall indeed consider C to be a polygon with short sides, dl being the length of a general side.[i] The benefit of this trick, as we shall see, is to make the unit normal uniform over each surface element, without forcing us to treat q (or any other field) as uniform over the same element. But, as the elements of C can independently be made as short as we like (dividing straight sides into shorter elements if necessary!), we can still consider q , and t ̂ to be uniform over each element of C.
Special case for the gradient
[edit | edit source]In (20c), the 2D condition is satisfied by (where p is a scalar field), because then the integrand on the left is zero on the broad faces of S , where n is parallel to. Equation (20c) then becomes
-
(
)
Now on the left, and on the right, over each surface element, the unit normal is uniform so that, by (8p), . With these substitutions, the minus signs cancel and we get
-
(
)
or, if we write dr = t ̂ dl and
-
(
)
This result, although well attested in the literature,[20] does not seem to have a name—unlike the next result.
Special case for the curl
[edit | edit source]In (20d), the 2D condition is satisfied if q is replaced by because then (again) the integrand on the left is zero on the broad faces of S , where n is parallel to. Equation (20d) then becomes
-
(
)
Now on the left, the integrand can be written and on the right, by identity (8c), since is uniform over each surface element. With these substitutions, the minus signs cancel and we get
-
(
)
or, if we again write dr = t ̂ dl and
-
(
)
This result—the best-known theorem relating an integral over a surface segment to an integral around its enclosing curve, and the best-known theorem involving the curl—is called Stokes' theorem or, more properly, the Kelvin–Stokes theorem,[21] or simply the curl theorem.[22]
The integral on the left of (22c) or (22r) is called the circulation of the vector field q around the closed curve C. So, in words, the Kelvin–Stokes theorem says that the circulation of a vector field around a closed curve is equal to the flux of the curl of that vector field through any surface spanning that closed curve.
Now let a general element of Σ (with area dΣ ) be enclosed by the curve δC, traversed in the same direction as the outer curve C. Then, applying (22c) to the single element, we have
that is,
-
(
)
where the right-hand side is simply the circulation per unit area.
Equation (23c) is an alternative definition of the curl: it says that the curl of q is the vector whose scalar component in any direction is the circulation of q per unit area of a surface whose normal points in that direction. For real q, this component has its maximum, namely |curl q| , in the direction of curl q ; thus the curl of q is the vector whose direction is that which a surface must face if the circulation of q per unit area of that surface is to be a maximum, and whose magnitude is that maximum. This is the usual conceptual definition of the curl.[23]
[Notice, however, that our original volume-based definition (4c) is more succinct: the curl is the closed-surface circulation per unit volume, i.e. the skew surface integral per unit volume.]
It should now be clear where the curl gets its name (coined by Maxwell), and why it is also called the rotation (indeed the curl operator is sometimes written "rot", especially in Continental languages, in which "rot" does not have the same unfortunate everyday meaning as in English). It should be similarly unsurprising that a vector field with zero curl is described as irrotational (which one must carefully pronounce differently from "irri tational"!), and that the curl of the velocity of a medium is called the vorticity.
However, a field does not need to be vortex-like in order to have a non-zero curl; for example, by identity (8p), in Cartesian coordinates, the velocity field xj has a curl equal to ∇x × j = i × j = k , although it describes a shearing motion rather than a rotating motion. This is understandable because if you hold a pencil between the palms of your hands and slide one palm over the other (a shearing motion), the pencil rotates. Conversely, we can have a vortex-like field whose curl is zero everywhere except on or near the axis of the vortex. For example, the Maxwell–Ampère law in magnetostatics says that curl H = J , where H is the magnetizing field and J is the current density.[j] So if the current is confined to a wire, curl H is zero outside the wire—although, as is well known, the field lines circle the wire. The resolution of the paradox is that H gets stronger as we approach the wire, making a shearing pattern, whose effect on the curl counteracts that of the rotation.
The curl-grad and div-curl operators
[edit | edit source]We have seen from (9L) that the Laplacian of a scalar field is the divergence of the gradient. Four more such second-order combinations make sense, namely the curl of the gradient (of a scalar field), and the divergence of the curl, the gradient of the divergence, and the curl of the curl (of a vector field). The first two —"curl grad" and "div curl"— can now be disposed of.
Let the surface segment Σ enclosed by the curve C be a segment of the closed surface S surrounding the volume V, and let Σ expand across S until it engulfs{{mvar| V},} so that C shrinks to a point on the far side of S. Then, in the nameless theorem (21g) and the Kelvin–Stokes theorem (22c), the integral on the left becomes zero while Σ and on the right become S and n̂ , so that the theorems respectively reduce to
and
Applying theorem (5c) to the first of these two equations, and the divergence theorem (5d) to the second, we obtain respectively
and
As the integrals vanish for any volume V in which the integrands are defined, the integrands must be zero wherever they are defined; that is,
-
(
)
and
-
(
)
In words, the curl of the gradient is zero, and the divergence of the curl is zero; or, more concisely, any gradient is irrotational, and any curl is solenoidal.
We might well ask whether the converses are true. Is every irrotational vector field the gradient of something? And is every solenoidal vector field the curl of something? The answers are affirmative, but the proofs require more preparation.
Meanwhile we may note, as a mnemonic aid, that when the left-hand sides of the last two equations are rewritten in the del-cross and del-dot notations, they become ∇ × ∇p and ∇ ⸱ ∇ × q , respectively. The former looks like (but isn't) a cross-product of two parallel vectors, and the latter looks like (but isn't) a scalar triple product with a repeated factor, so that each expression looks like it ought to be zero (and it is). But such appearances can lead one astray, because ∇ is an operator, not a self-contained vector quantity; for example, ∇p × ∇φ is not identically zero, because two gradients are not necessarily parallel.[24]
We should also note, to tie a loose end, that identity (24d) was to be expected from our verbal statement of the Kelvin–Stokes theorem (22c). That statement implies that the flux of the curl through any two surfaces spanning the same closed curve is the same. So if we make a closed surface from two spanning surfaces, the flux into one spanning surface is equal to the flux out of the other, i.e. the net flux out of the closed surface is zero, i.e. the integral of the divergence over the enclosed volume is zero; and since any simple volume in which the divergence is defined can be enclosed this way, the divergence itself (of the curl) must be zero wherever it is defined.
Change per unit length
[edit | edit source]Continuing (and concluding) the trend of reducing the number of dimensions, we now seek one-dimensional theorems, each of which relates an integral over a path to values at the endpoints of the path. For maximum generality, the path should be allowed to be curved into a second and a third dimension.
We could do this by further specializing theorems (5g) to (5L). We could take a curve Γ with a unit tangent vector ŝ. At every point on Γ we could mount a circular disk with a uniform small area α , centered on Γ and orthogonal to it. We could let V be the volume occupied by all the disks and let S be its enclosing surface; thus V would be a thin right circular cylinder, except that its axis could be curved. If we could arrange for α to cancel out, our four theorems would indeed be reduced to the desired form, provided that there were no contribution from the curved face of the "cylinder" to the integral over S (the "1D proviso"). But, as it turns out, this exercise yields only one case in which the "1D proviso" can be satisfied by a construction involving ŝ and a general field, and we have already almost discovered that case by a simpler and more conventional argument—which we shall now continue.
Fundamental theorem
[edit | edit source]Equation (9g) is applicable where p(r) is a scalar field, s is a parameter measuring arc length along a curve Γ, and ŝ is the unit tangent vector to Γ in the direction of increasing s. Let s take the values s1 and s2 at the endpoints of Γ, where the position vector r takes the values r1 and r2 respectively. Then, integrating (9g) w.r.t. s from s1 to s2 and applying the fundamental theorem of calculus, we get
-
(
)
This is our third integral theorem involving the gradient, and the best-known of the three: it is commonly called simply the gradient theorem,[25] or the fundamental theorem of the gradient, or the fundamental theorem of line integrals; it generalizes the fundamental theorem of calculus to a curved path.[26] If we write dr for ŝ ds (the change in the position vector), we get the theorem in the alternative form
-
(
)
As the right-hand side of (25g) or (25r) obviously depends on the endpoints but not on the path in between, so does the integral on the left. This integral is commonly called the work integral of ∇p over the path—because if ∇p is a force, the integral is the work done by the force over the path. So, in words, the gradient theorem says that the change in value of a scalar field from one point to another is the work integral of the gradient of that field field over any path from the one to the other.
Applying (25r) to a single element of the curve, we get
-
(
)
which is reminiscent of in elementary calculus.[27] Alternatively, we could have obtained (26g) by multiplying both sides of (9g) by ds, and then obtained (25r) by adding (26g) over all the elemental displacements dr on any path from r1 to r2.
If we close the path by setting r2 = r1 , the gradient theorem reduces to
-
(
)
where the integral is around any closed loop. Applying the Kelvin–Stokes theorem then gives
-
(
)
where Σ is any surface spanning the loop and is the unit normal to Σ. As this applies to any loop spanned by any surface on which the integrand is defined, curl ∇p must be zero wherever it is defined. This is a second proof (more conventional than the first) of theorem (24c).
Scalar potential: field with given gradient
[edit | edit source]Lemma: If curl q = 0 in a simply connected region V, then over any path in V depends only on the endpoints of the path.
Proof: Suppose, on the contrary, that there are two paths Γ and Λ in V, with a common starting point and a common finishing point, such that
Let −Λ denote Λ traversed backwards. Then for every dr on Λ there is an equal and opposite dr on −Λ , so that we have
i.e.
where the left-hand side is now a work integral of q around a closed loop in V. By the simple connectedness of V, this loop is spanned by some surface Σ in V. So we can apply the Kelvin–Stokes theorem and conclude that the flux integral of curl q through Σ is non-zero, in which case curl q must be non-zero somewhere on Σ , hence somewhere in V — contradicting the hypothesis of the lemma. ◼
Corollary: If curl q = 0 in a simply connected region V, there exists a scalar field p such that q = ∇p in V.
Proof: We shall show that a suitable candidate is
where r0 is the position vector of any fixed point in V, and ρ is the position vector of a general point on the path of integration, which may be any path in V. First note that p(r) is unambiguous because, by the preceding lemma, it is independent of the path for given r0 and r, provided that the path is in V. Now to find ∇p(r), let σ be the arc length along the path from r0 to ρ , so that σ ranges from 0 to (say) s as ρ ranges from r0 to r ; and let ŝ be the unit vector tangential to the path at ρ , in the direction of increasing σ. Then dρ = ŝ dσ , so that the above equation becomes
Differentiating w.r.t. s gives
where ŝ is evaluated at σ = s and is therefore in the direction in which the path reaches r. By the generality of the path, this can be any direction. So the last equation says that q is the vector whose (scalar) component in any direction is the derivative of p w.r.t. arc length in that direction; that is, q = ∇p , as required. ◼
This is the promised converse of theorem (24c). But, given an irrotational vector field q , we usually prefer to find a scalar field whose negative gradient is q ; that is, we usually prefer a scalar field such that . Such a field is called a scalar potential for q. From the above expression for p(r), a suitable candidate is
-
(
)
A scalar field has zero gradient if and only if it is uniform, so that adding a uniform field, but only a uniform field, to a given scalar field leaves its gradient unchanged. Thus the scalar potential is determined up to an arbitrary additive uniform field. This would be the case with or without the minus sign in front of the gradient. The reason for preferring the minus sign appears next.
Conservative fields
[edit | edit source]An irrotational vector field—or, equivalently, a field that is (plus or minus) the gradient of something—is described as conservative, because if the field is a force, it does zero work around a closed loop, and consequently conserves energy around the loop (at least if the field does not change during traversal of the loop).
If the only force acting on a particle is F = −∇U, then, by the gradient theorem, the work done on the particle over a path is the increase in −U, i.e. the decrease in U ; and this work is the increase in the particle's kinetic energy T. Hence, if we identify U with the potential energy, the total energy U + T is conserved. This interpretation of the scalar potential is possible only if the force is minus the gradient of the potential.
The minus sign is also used if the conservative vector field is an electric field (force per unit charge) or a gravitational acceleration (force per unit mass); the scalar potential is potential energy per unit charge, or potential energy per unit mass, respectively.
Some special fields
[edit | edit source]The 1/r scalar potential
[edit | edit source]For the potential energy field
-
(
)
where r is the distance from the origin (and r ≠ 0), let us find the corresponding force F = −∇U. The direction of ∇U is that of the steepest increase of U, which, by the spherical symmetry, can only be parallel or antiparallel to r ̂ (the unit vector pointing away from the origin). So
whence
-
(
)
So the negative gradient of the 1/r scalar potential (30) is the unit inverse-square radial vector field. Multiplying the numerator and denominator by r gives the alternative form
which is convenient if the center of the force is shifted from the origin to position r′: in that case we simply replace r by r − r′, and r by |r − r′|, so that the force becomes
and the corresponding scalar potential becomes
Inverse-square radial vector field
[edit | edit source]We derived the vector field (31) as the negative gradient of the scalar potential (30). Conversely, given the inverse-square radial vector field (31), we could derive its scalar potential from (29). At a general point on the path, let the position vector be so that, by (31), . Then (29) becomes
so that, if we choose r0 → ∞ , we recover (30).
Because F, given by (31), has a scalar potential, curl F must be zero. This is independently obvious in that the spherical symmetry of F seems to rule out any resemblance of rotation or shear—even at the origin, where F becomes infinite. On the last point, let us check whether curl F has a meaningful integral over a volume containing the origin. If the volume V is enclosed by the surface S whose outward unit normal is n̂ , then, by theorem (5c),
If V contains the origin, then, because curl F is zero everywhere except at the origin, the volume V can be replaced by any element of V containing the origin, whatever the shape of that element may be. If we choose that element to be a spherical ball centered on the origin, then n̂ is parallel to r ̂ , so that the cross-product in the integrand on the right is zero. Thus the volume integral on the left is not only meaningful, but is zero, even if the volume contains the point where the integrand is infinite. In this sense, the field F is so irrotational that its curl may be taken as zero even where the field itself is undefined!
The situation concerning the divergence of F is more complicated. Again, let the volume V be enclosed by the surface S whose outward unit normal is n̂. By the divergence theorem (5d),
where dΩ is the solid angle subtended at the origin by the surface element of area dS , and is positive if the outward unit normal n̂ has a positive component away from the origin (r ̂⸱ n̂ > 0), and negative if n̂ has a positive component toward the origin (r ̂⸱ n̂ < 0). If the volume enclosed by S does not include the origin, then for every positive contribution dΩ there is a compensating negative contribution, so that the integral of div F over the volume is zero. As this applies to every such volume, div F must be zero everywhere except at the origin. If, on the contrary, the volume does include the origin, then the contributions dΩ add up to the total solid angle subtended by the enclosing surface, which is 4π. In summary,
-
(
)
where δ(r), the 3D unit delta function, is zero everywhere except at the origin, but has an integral of 1 over any volume that includes the origin. For example, a unit point-mass at the origin has the density δ(r), and a point-mass m at position r′ has the density mδ(r − r′). As the argument of div in (32d) is −∇(1/r), we also have
-
(
)
If we shift the centers from the origin to r′, the last two results become
-
(
)
and
-
(
)
Field with given divergence (and zero curl)
[edit | edit source]It follows from Coulomb's law that the electric field due to a point-charge Q at the origin, in a vacuum, is
where ϵ0 is a physical constant (called the vacuum permittivity or simply the electric constant). In a vacuum, the electric displacement field, denoted by D , is ϵ0E. So it is convenient to multiply the above equation by ϵ0 , obtaining
This is a inverse-square radial vector field and therefore has zero curl.
Now suppose that, instead of a charge Q at the origin, we have a static charge density ρ(r′) in a general elemental volume dV′ at position r′ (the standard symbol for charge density being unfortunately the same as for mass density). Then the contribution from that element to the field D at position r is
provided that, for each r, the dimensions of each volume element are small compared with |r − r′|. This contribution likewise has zero curl. The total field due to static charges is then the sum of the contributions:
-
(
)
where the integral is over all space. And D(r) has zero curl because all the contributions have zero curl.
Independently of the physical significance of D(r), we can take its divergence "term by term" (or "under the integral sign"), obtaining
where the last step is permitted because the volume integral of the delta function of r′ is not changed by a "point reflection" (inversion) across r. As the volume of integration (all space) includes the shifted origin of the delta function, the integral is simply 1 , so that
-
(
)
where both sides are evaluated at r.
Mathematically, this result is an identity which applies if D is given by (34); substituting for D , we can write the identity in full as
-
(
)
where the integral is over all space, or at least all of the space in which ρ may be non-zero. Subject to the convergence of the integral, this shows that we can construct an irrotational vector field whose divergence is a given scalar field ρ(r). And of course, by theorem (24d), any curl can be added to that vector field without changing its divergence.
In electrostatics, (34) is a generalization of Coulomb's law; and (35), which follows from (34), is Gauss's law expressed in differential form. If we integrate (35) over a volume enclosed by a surface S (with outward unit normal n̂) and apply the divergence theorem on the left, we get the integral form of Gauss's law:
-
(
)
where Qe is the total charge enclosed by S.
Field with given Laplacian
[edit | edit source]In (36), we can recognize the r-dependent factor r − r′/|r − r′|3 as −∇1/ |r − r′| and take the gradient operator outside the integral, obtaining
i.e.
-
(
)
where again the integral is over all space, or at least all of the space in which ρ may be non-zero. Subject to the convergence of the integral, this shows that we can construct a field whose Laplacian is a given field. More precisely, it shows that we can construct a scalar field whose Laplacian is a given scalar field ρ(r). But, due to the linearity of the Laplacian, the same applies to any given linear combination of scalar fields, including any combination whose coefficients are uniform vectors, uniform matrices, or uniform tensors of any order; that is, the same applies to any field that we can express with a uniform basis.
Mathematically, (38) is simply an identity. To find its significance in electrostatics, we can multiply it by −1⧸ϵ0 , obtaining
-
(
)
which is also an identity. But the negative gradient of the expression after the integral sign is
which is the contribution to the electric field at position r due to a charge ρ(r′) dV′ at position r′ in a vacuum. So the expression after the integral sign is the corresponding contribution to the electrostatic potential, and the whole integral is the whole electrostatic potential. Denoting this by we can rewrite (39) as
-
(
)
This is Poisson's equation in electrostatics, treating the medium as a vacuum (so that ρ must be taken as the total charge density, including any contributions caused by the effect of the field on the medium). In a region in which ρ = 0 , Poisson's equation (40) reduces to
-
(
)
which is Laplace's equation in electrostatics.
The wave equation
[edit | edit source]It is an empirical fact that a compressible fluid, such as air, carries waves of a mechanical nature: sound waves. In establishing the unambiguity of the gradient and the divergence, we have already derived equations dealing with the inertia and continuity (mass-conservation) of non-viscous fluids. So, by introducing a relation describing the compressibility, and eliminating variables, we should be able to get one equation (the "wave equation") in one scalar or vector field (the "wave function"), with recognizably "wavelike" solutions. And we should expect this equation to be analogous to equations describing other kinds of waves.
If we suppose, for simplicity, that the only force acting on an element of fluid is the pressure force, the applicable equation of motion is (6g). But, for reasons which will soon be apparent, let us call the pressure P, so that (6g) becomes
Then at equilibrium we have
where P0 is the equilibrium pressure. Subtracting this equation from the previous one and defining
we get
which looks like (6g), except that p is now the sound pressure (also called "acoustic pressure", or sometimes "excess pressure"), i.e. the pressure rise above equilibrium.
For the equation of continuity we can use (7d'), which we repeat for convenience:
Eliminating v between the last two equations is fraught because v is evaluated at a moving point in the former and at a fixed point in the latter; and introducing any relation between p and ρ is similarly fraught because p is evaluated at a fixed point and ρ at a moving point. The obvious remedy is to apply the advection rule (16) to the last two equations, obtaining respectively
That gets all the variables evaluated at fixed points, at the cost of making the equations more complicated and more obviously non-linear. But the equations and be simplified and linearized by small-amplitude approximations. In the parentheses in the first equation, the first term is proportional to the amplitude of the vibrations while the second term is a product of two factors proportional to the amplitude, so that, for sufficiently small amplitudes, the second term is negligible. Similarly, in the second equation, for sufficiently small amplitudes and a homogeneous medium, we can neglect the second term on the right. Then, on the left side of each equation, we are left with a factor proportional to the amplitude, multiplied by ρ. But ρ is not proportional to the amplitude; only its deviation from the equilibrium density is so proportional. Hence, for small amplitudes, ρ can be replaced by the equilibrium density, which we shall call ρ0 , which is independent of time and (in a homogeneous medium) independent of position. With these approximations, our equations of motion and continuity become
where, for brevity, we use an overdot to denote partial differentiation w.r.t. time (i.e., at a fixed point, not a point moving with the fluid).
Now we can eliminate v. Taking divergences in the first equation, and differentiating the second partially w.r.t. time (which can be done inside the div operator, which represents a linear combination), we get
so that we can equate the right-hand sides, obtaining
-
(
)
Maintaining the small-amplitude assumption, we can now consider compressibility. For small compressions in a homogeneous medium, we may suppose that the pressure change dp is some constant times the density change dρ. It is readily verified that such a constant must have the dimension of velocity squared. So we can say dp = c² dρ , where c is a constant with the units of velocity.[k] Dividing by dt gives whence
-
(
)
Substituting from (42) then gives the desired wave equation:
-
(
)
This is the 3D classical wave equation with the sound pressure p as the wave function. For a generic wave function ψ , in a homogeneous isotropic medium, we would expect the equation to be
-
(
)
which may be written more compactly as
-
(
)
where ☐, pronounced "wave" or "box",[l] is called the D'Alembertian operator and is defined by
-
(
)
in this paper, although other conventions exist.[m]
In a static situation, the second term on the right of (47) is zero. So one advantage of definition (47), over any alternative definition that changes the sign or the scale factor, is that in the static case, the D'Alembertian is reduced to the Laplacian, making it especially obvious that in the static case, the wave equation is reduced to Laplace's equation [compare (46) and (41)]. Also notice that the D'Alembertian, being a linear combination of two linear operators, is itself linear.
Spherical waves
[edit | edit source]Having established that there are wavelike time-dependent fields described by equation (45), in which the constant c has the units of velocity, we can now make an informed guess at an elementary solution of the equation. Consider the candidate
-
(
)
where r = r r ̂ is the position vector (so that r is distance from the origin), f is an arbitrary function (arbitrary except that it will need to be twice differentiable), t is time, and c is a constant (and obviously ψ is not defined at the origin even if f is.)
If, at the origin, the function f has a certain argument at time t = τ , then at any distance r from the origin, it has the same argument at time t = τ + r⧸c , which is r⧸c later than at the origin. Hence, if f has a certain feature (e.g., a zero-crossing) at the origin, the time taken for that feature to reach any distance r is r⧸c , implying that the feature travels outward from the origin at speed c. Another way to perceive this is to set the argument of f equal to a constant (corresponding to some feature of the function) and differentiate w.r.t. t , obtaining r ̇ = c (the speed at which the feature recedes from the origin). Thus equation (48) describes waves radiating outward from the origin with speed c. [n]
Equation (48) further implies that there are surfaces over which the wave function ψ is uniform—namely surfaces of constant r, i.e. spheres centered on the origin. These are the wavefronts. So (48) describes spherical waves.
Because the surface area of a sphere is proportional to the square of its radius, we should expect the radiated intensity (power per unit area) to satisfy an inverse-square law (if the medium is lossless—neither absorbing nor scattering the radiated power). That does not mean that the wave function itself should satisfy an inverse-square law. In a traveling wave in 3D space, there will be an "effort" variable (e.g., sound pressure) and a "flow" variable (e.g., fluid velocity), and the instantaneous intensity will be proportional to the product of the two. If the two are proportional to each other, the instantaneous intensity will be proportional to the square of one or the other. Hence if the instantaneous intensity falls off like 1/r ², the effort and flow variables—and the wave function, if it is proportional to one or the other—will fall off like 1/r. That suggests the attenuation factor 1/r in (48).
But there are big if s in that argument. For all we know so far, the relation between effort and flow could involve a lag, so that the instantaneous product of the two could swing negative although it averages to something positive. And for all we know so far, the lag could vary with r, allowing at least one of the two (effort or flow) to depart from the 1/r law, even if their average product still falls off like 1/r ². The 1/r factor in (48) is therefore only an "informed guess". Notwithstanding these complications, we have also guessed that the form of the function f (the waveform) does not change as r increases; we have not considered whether this behavior might depend on the medium, or the waveform, or the geometry of the wavefronts.
So let us carefully check whether (48) satisfies (45) or, equivalently, (46).
As a first step, and as a useful inquiry in its own right, we find △ψ from definition (4L), given that ψ is a function of (r, t) only. For the surface δS let us start with
- a cone (not a double cone) with its apex at the origin, subtending a small solid angle ω at the origin,
- a sphere centered on the origin, with radius r, and
- a sphere centered on the origin, with radius r + dr ;
and let the volume element be the region inside the cone and between the spheres, so that its enclosing surface δS has three faces: a segment of the cone, a segment of the inner sphere with area r ² ω , and a segment of the outer sphere with area (r + dr)2ω . By the symmetry of ψ , the outward normal derivative ∂n ψ is equal to zero on the conical face, +∂r ψ(r + dr, t) on the outer spherical face, and −∂r ψ(r, t) on the inner spherical face. The volume of the element is dV = r ² ω dr. So, assembling the pieces of definition (4L), we get
i.e.
-
(
)
Now we can verify our "informed guess". Differentiating (48) twice w.r.t. t by the chain rule gives
-
(
)
where each prime (′) denotes differentiation of the function w.r.t. its own argument. Differentiating (48) once w.r.t. r by the product rule and chain rule, we get
-
(
)
Proceeding as specified in (49), we multiply this by r ², differentiate again w.r.t. r (giving three terms, of which two cancel), and divide by r ², obtaining
-
(
)
Then if we substitute (52) and (50) into (47), we obviously get ☐ψ = 0 , satisfying (46). So we have guessed correctly.
Having shown that the D'Alembertian of ψ , as given by (48), is zero everywhere except at the origin (where it is not defined), let us now find its integral over a volume V (enclosed by a surface S) that includes the origin. From (47),
where the second equality follows from theorem (5L). Now because the integrand on the left is zero except at the origin, any V containing the origin will give the same integral. So for convenience, let V be a spherical ball of radius R centered on the origin. Then, by the spherical symmetry of ψ , integration over S reduces to multiplication by 4πR 2, and ∂n is equivalent to ∂r , and dV can be taken as 4πr 2dr. With these substitutions we have
or, substituting from (51) and (50),
Again noting that any V containing the origin will give the same volume integral, we can let R approach zero, with the result that the right-hand side approaches −4πf (t). This is the integral of ☐ψ over any volume containing the origin, for ψ given by (48). Meanwhile ☐ψ is zero everywhere except that the origin. In summary,
-
(
)
Shifting the center of the spherical waves from the origin to position r′, we get
-
(
)
We shall refer to the field given by (48) as the wave function due to a monopole source with strength f (t) at the origin. The D'Alembertian of this wave function is given by (53).[28] Hence the field whose D'Alembertian is given by (54) is the wave function due to a monopole source with strength f (t) at position r′. In each case, the D'Alembertian is zero everywhere except at the source.
Field with given D'Alembertian
[edit | edit source]Now suppose that, instead of a wave source with strength f (t) at the general position r′, we have at that position a wave-source density in an elemental volume dV′, whose contribution to the wave function ψ at position r is
where for each r, the dimensions of each volume element are small compared with |r − r′|. Then the total wave function is the sum of the contributions:
-
(
)
where the integral is over all space.
Independently of the physical significance of ψ(r, t), we can take its D'Alembertian "under the integral sign" by rule (54), obtaining
that is,
-
(
)
Mathematically, equation (56) is an identity which applies if ψ(r, t) is given by (55). Substituting from (55) and solving for we can write the identity in full as
-
(
)
where the integral is over all space, or at least all of the space in which may be non-zero. Subject to the convergence of the integral, this shows that we can construct a wave function with a given D'Alembertian.
Physically, equation (56) gives the D'Alembertian of the wave function for a source density. It is the inhomogeneous wave equation, which applies in the presence of an arbitrary source density—in contrast to the homogeneous wave equation (46), which applies in a region where the source density is zero. In this context the word homogeneous or inhomogeneous describes the equation, not the medium (which has been assumed homogeneous and isotropic).
In a static situation, in which the D'Alembertian is reduced to the Laplacian, the inhomogeneous wave equation (56) is reduced to the form of Poisson's equation (40). As written, equation (40) is Poisson's equation in electrostatics; it applies to the charge density ρ(r), for which the scalar potential [in (39)] is
In electrodynamics, which takes time-dependence into account, the scalar potential due to the charge density ρ(r, t) is
where the wave speed c is the speed of light; this is the same as in the static case except for the delay |r − r′| /c , indicating that the influence of the change density at r′ travels outward from that point at the speed of light. In the dynamic case, by rule (57), the D'Alembertian of the scalar potential is
This result is the inhomogeneous wave equation in the scalar potential—the equation which, in the electrostatic case, reduces to Poisson's equation (40).
In electrodynamics, however, the electric field E is not simply but where A is the magnetic vector potential, whose defining property is that its curl is the magnetic flux density:
By identity (24d), this property implies
which is Gauss's law for magnetism. We have noted in passing—but not yet proven—that (24d) has a converse, whereby the solenoidality of B implies the existence of the vector potential A. Precedents suggest we might be able to prove this by finding a vector field whose curl is a delta function—perhaps through new identities relating it to a field whose divergence is a delta function—and using it to construct a vector field with a given curl. In fact we shall prove our "converse" differently, but we shall still need some new identities for the purpose. And to obtain those identities (among others), we must take the detour that we have made a virtue of not taking until now…
Cartesian coordinates
[edit | edit source]Indicial notation; implicit summation
[edit | edit source]Considering that a scalar field is a function of three coordinates, while a vector field has three components each of which is a function of three coordinates, we can readily imagine that coordinate-based derivations of vector-analytic identities are likely to be excruciatingly repetitive—unless perhaps we choose a notation that concisely specifies the repetition. So, instead of writing the Cartesian coordinates as x, y, z , we shall usually write them as xi where i = 1, 2, 3 , respectively; and instead of writing the unit vectors in the directions of the respective axes as i, j,k , we shall usually write them as ei . And for partial differentiation w.r.t. xi , instead of writing ∂/∂xi or even ∂xi , we shall write ∂i .
Now comes a stroke of genius for which we are indebted to Einstein—although he used it in a more sophisticated context! Instead of writing the position vector as
or even as
we shall write it simply as
where it is understood that we sum over the repeated index. More generally, we shall write the vector field q as
with implicit summation, and the vector field v as
with implicit summation, and so on. (By that nomenclature, the position vector in Cartesian coordinates should be, and often is, called x ; but we called it r because we wanted to call its magnitude r, for radius.)
Implicit summation not only avoids writing the Σ symbol and specifying the index of summation, but also allows a summation over two repeated indices, say i and j , to be considered as summed first over i and then over j or vice versa, removing the need for an explicit regrouping of terms. Of course, if we hide messy details behind a notation, we need to make sure that it handles those details correctly. In particular, when we perform an operation on an implicit sum, we implicitly perform it term-by-term, and must therefore make sure that the operation is valid when interpreted that way.
Formulation of operators
[edit | edit source]Gradient: Putting s = xi in (9g), we find that the scalar component of ∇p in the direction of each ei is ∂i p. To obtain the vector component in that direction, we multiply by ei . Assembling the components, we have (with implicit summation)
-
(
)
or, in operational terms,
-
(
)
or, in traditional longhand notation,
-
(
)
It is also worth noting, from (58g), that the squared magnitude of ∇p is
-
(
)
where we write ∂i p ∂i p rather than (∂i p)2 to ensure that implicit summation applies!
As reported by Tai (1994), there are unfortunately some textbooks in which the del operator is defined as
- [sic! ]
—which, on its face, is not an operator at all, but a self-contained expression whose value is the zero vector (because it is a sum of derivatives of constant vectors). Among the offenders is Erwin Kreyszig, who, in the 6th edition of his bestselling Advanced Engineering Mathematics (1988, p. 486), misdefines the del operator thus and then rewrites the gradient of f as ∇ f, apparently imagining that the differentiation operators look through the constant vectors rather than at them. Six pages later, he defines the divergence in Cartesian coordinates (which we shall do shortly) and then immediately informs us that "Another common notation for the divergence of v is ∇⸱ v," where ∇ is defined as before, but the resulting ∇⸱ v is apparently not identically zero![29] These errors persist in the 10th edition (2011, pp. 396, 402–3). Tai finds similar howlers in mathematics texts by Wilfred Kaplan, Ladis D. Kovach, and Merle C. Potter, and in electromagnetics texts by William H. Hayt and Martin A. Plonus.[30] Knudsen & Katz, in Fluid Dynamics and Heat Transfer (1958), avoid the misdefinition of ∇, but implicitly define the divergence of V as V⸱∇ (which, as we have seen, is actually an operator), and then somehow reduce it to the correct expression for div V. [31] But I digress.
Curl and divergence: Expressing the operand of the curl in components, and noting that the unit vectors are uniform, we can apply (8p):
If we sum over j first, this is
-
(
)
or, in operational terms,
-
(
)
or, in traditional longhand,
For the divergence we proceed as for the curl except that, instead of (8p), we use (8g):
that is,
-
(
)
or, in operational terms,
-
(
)
or, in traditional longhand,
It follows from (59c) and (60d), if it was not already obvious, that a uniform vector field has zero curl and zero divergence.
Although the above expressions for the divergence and curl will surprise many modern readers, they match the initial definitions of the divergence and curl given by the founder of vector analysis as we know it, J. Willard Gibbs (1881, § 54). Gibbs even uses the ∇ × and ∇⸱ notations on the left sides of the defining equations, and only after the equations (albeit immediately after) does he announce that " ∇⸱ ω is called the divergence of ω and ∇ ×ω its curl." (He uses Greek letters for vectors.) Our notation and Cartesian expression for the gradient (58g) also match Gibbs (1881, § 52). Hence, using the Gibbs notations, we can merge definitions (58g), (59c), and (60d) into the general Cartesian formula
-
(
)
(with implicit summation), where the ∗ operator may be a null (for the gradient), a cross (for the curl), or a dot (for the divergence).
Gibbs does not offer any justification for the ∇ × and ∇⸱ notations, but nor is it difficult to find such a justification based on his definitions. As ei is a uniform vector, we can rewrite (59c) rigorously as
-
(
)
and thence operationally as
-
(
)
or, recalling (58o),
which can be evaluated in the usual manner as
where qx is the x component of q , etc. This indeed is how one evaluates the curl of a given field in Cartesian coordinates, although we shall find (59c) more convenient for deriving identities. Similarly, we can rewrite (60d) rigorously as
-
(
)
and thence operationally as
-
(
)
or, recalling (58o),
For evaluating the divergence of a given field, however, we simplify (62d) to
or, in traditional longhand,
although we shall find (60d) more convenient for deriving identities. But the longhand form makes it especially obvious that if r is the position vector,
-
(
)
Notice that we can get from (62o) back to (60d) by permuting the ∂i with the dot, and from (61o) back to (59c) by permuting the ∂i with the cross, as if the differentiation operator could, as it were, look through the dot or the cross—or, as Gibbs's student Edwin B. Wilson puts it, "pass by" the dot and the cross, yielding Gibbs's original definitions.[32] Hence Wilson considers it helpful to regard Gibbs's ∇⸱ and ∇ × notations as "the (formal) scalar product and the (formal) vector product of ∇ into" the operand, or "the symbolic scalar and vector products of ∇ into" the operand, and to regard ∇ as a "symbolic vector"[33] (not to be confused with Tai's symbolic vector).
Tai (1994, 1995) rejects Wilson's argument together with the entire tradition of treating ∇ × and ∇⸱ as compound operators. Of formal products, Tai says that the concept "has had a tremendously detrimental effect upon the learning of vector analysis"; he calls such a product a "meaningless assembly".[34] Of the "pass by" step, he complains that "standard books on mathematical analysis do not have such a theorem."[35]
I submit, however, that the intermediate steps (61c) and (62d), after which we take the constant multiplier outside the operator (eqs. 61o & 62o), support Wilson's "pass by" argument. In any event the reader may write out the sums on the right-hand sides of (59c) and (60d) and verify that they agree with the formal products ∇ × q and ∇⸱ q respectively—and may notice that in the evaluation of each formal product, the cross or dot may be eliminated, leaving nothing to "pass by".[36] I further submit that the great generality of our derivation of equations (14), above, compels us to treat the ∇ × and ∇⸱ notations as more than mere notations. But the kicker is that Tai himself, having found the form of the del operator in general coordinates (1995, p. 64, eq. 9.33), derives original corresponding forms of the div and curl operators (his eqs. 9.35 & 9.40) which, upon reversal of the forbidden "pass by", become del-dot and del-cross! Indeed his three equations, just cited, are reminiscent of our (58o), (60o), and (59o) respectively. That being said, I shall find some points of agreement with Tai, and some reasons to criticize Wilson.
Laplacian: If ψ is a scalar field, then
that is,
-
(
)
where we write ∂i ∂i rather than ∂i2 in order to maintain implicit summation. In traditional longhand, (63L) becomes
or, in operational terms,
or, by comparison with (58t),
—as expected.
By the linearity of the Laplacian, the same applies if ψ is any field expressible in terms of a uniform basis. For example, if ψ is a vector field given by ψj ej (with implicit summation), then
where the third line follows from (63L) as applied to a scalar field. Thus (63L) is quite general.
After listing theorems (5g) to (5L) above, we gave reasons for describing ∇, curl, and div as differential operators, and △ as a 2nd-order differential operator—the implication being that the others are only 1st-order. We now have the promised "additional reason" for these descriptions: when expressed in Cartesian coordinates, the △ operator involves second derivatives, while the others involve (only) first derivatives. In the meantime we have acquired the q⸱∇ operator, which is also 1st-order, as we shall now confirm.
Advection, directional derivative, etc.: If ψ is a scalar field, then
In this double summation, the only non-zero terms are those for which j = i , in which case ei ⸱ ej = 1. So we have
-
(
)
or, in operational terms,
-
(
)
or, in traditional longhand,
which indeed is the "formal" or "symbolic" dot-product of q and ∇. By the linearity of the directional derivative in (11), the same result applies if ψ is a vector field or any field expressible in terms of a uniform basis. In particular, if r is the position vector, we have
i.e.,
-
(
)
—which is also deducible from (11).
For convenience in the following discussion, we shall refer to the scaled-directional-derivative operator q⸱∇ as an "advection" operator although, physically, it represents advection only if q is the material velocity.
Identities without pain
[edit | edit source]In deriving the Cartesian expressions for the gradient, curl, divergence, Laplacian, and advection operators, we used the preceding identities (9g), (8p), (8g), (9L'), and (11) respectively, the last being a definition generalizing (9g). Thus we could have derived the Cartesian expressions quite early in the exposition, although we did not find that option convenient. The other vector-analytic identities that we have previously mentioned are:
- (8c), which showed the unambiguity of the curl;
- (8q), which has a question mark after it;
- (17), a product rule for the divergence, which is yet to be proven as a general identity;
- (24c) and (24c), concerning "curl grad" and "div curl"; and
- the identities showing that we can construct a field with a given divergence (36), Laplacian (38), or D'Alembertian (57).
The above list exposes the following shortcomings:
- we have not yet investigated "grad div" and "curl curl";
- we have only one product rule —the unverified identity (17)—in which both factors are spatially variable fields; this needs to be verified and identities (8c) and (8p) need to be generalized;
- our collection of product rules does not yet include the curl of a cross-product, or the gradient of a dot-product or of a product of scalars, or the advection of a product; and
- we do not yet have any chain rules involving ∇, curl, or div.
With the aid of the Cartesian forms of the various operators, we may now fill these gaps.
The "grad div" and "curl curl" operators turn out to be related:
whence expanding the vector triple product gives
In the first term on the right, we can switch the order of partial differentiation; and in the second term—which, like the first, is a double summation—the only non-zero contributions are those for which j = i and ei ⸱ ej = 1. So we have
that is,
-
(
)
This result may be memorized as "curl curl is grad div minus del squared " and written as
-
∇ × (∇ × q) ≡ ∇ ∇⸱ q − ∇2 q ,
(
)
which looks like the expansion of a vector triple product; and the key step in the above derivation, based on the Gibbs definitions of the operators, really is the expansion of a vector triple product.
We now turn to product rules in which neither factor is assumed uniform.
The curl of a cross-product is
i.e.,
-
(
)
The divergence of a cross-product, as we might expect, is simpler:
i.e.,
-
(
)
In particular, in electromagnetics, div(E × H) ≡ H ⸱ curl E − E ⸱ curl H ; this is the identity on which Poynting's theorem is based. But if b in (67d) is uniform, then (67d) reduces to (8c).
The gradient of a dot-product, by comparison, is surprisingly messy:
Now the first term on the right can be recognized as a × (ei × ∂i b) + a⸱ ei ∂i b ; that is, a × (ei × ∂i b) + ai ∂i b ; that is, . Similarly, the second term is . Thus we have
-
(
)
For uniform b , the first and third terms on the right vanish, and we can solve for the first term on the right, obtaining
- [ for uniform b] ,
so that we can now drop the question mark after (8q). If we write the curl operator as ∇ × , the last equation [or (8q)] looks like the expansion of a vector triple product; but the identity is valid only for uniform b.
The gradient of a product of scalars, unlike that of a dot-product, is as simple as the product rule for ordinary differentiation:
that is,
-
(
)
The advection of a product is equally simple, regardless of the type of product, except that the order of a cross-product matters. Let ψ and χ be scalar or vector fields, and let ψ ∗χ denote any meaningful product of the two. Then, by (64),
that is,
-
(
)
The q⸱∇ operator is a scalar operator in the sense that it maps the operand field to a field of the same order—a scalar field to a scalar field, a vector field to a vector field, a matrix field to a matrix field, etc.— as if it were multiplication by a scalar or differentiation w.r.t. a scalar; and indeed a differentiation w.r.t. path length appears in the coordinate-free definition (11) of the operator. Moreover, we did not need coordinates to obtain rule (70); as the reader may verify, the same rule can be obtained directly from the definition (11) in a similar manner. From these points of view, the simplicity of the rule is unsurprising.
The curl of the product of a scalar and a vector is
that is,
-
(
)
For uniform b , this reduces to (8p), which was used to derive the Cartesian form of the curl (59c).
For the divergence of the product of a scalar and a vector, we proceed likewise except that we use a dot instead of a cross. The result is
-
(
)
which has the same form as (17), delivering the promised confirmation that (17) is an identity. For uniform b , (71d) reduces to (8g), which was used to derive the Cartesian form of the divergence (60d).
That exhausts the first-order product rules. For curiosity's sake, we shall also derive one second-order rule.
The Laplacian of the product of a scalar field and a generic field, by (63L), is
In the middle term, by (58g), ∂i p is the i th component of ∇p so that, by (64o), ∂i p ∂i is the q⸱∇ operator for q = ∇p. So we have
-
(
)
The argument assumes a scalar p but is indifferent to whether ψ is a scalar or a vector or a higher-order tensor.
Finally we turn to chain rules — especially the simple cases of the gradient, curl, divergence, advection, and Laplacian of a function of a scalar field u. As usual, let p denote a scalar field, q a vector field, and ψ a generic field.
Gradient ⧸ curl ⧸ divergence of a function of a scalar: By the general Cartesian formula (60s) and the chain rule for ∂i ,
i.e., by (58g),
-
(
)
In particular, if ∗ is a null,
-
(
)
and if ∗ is a cross,
-
(
)
and if ∗ is a dot,
-
(
)
Advection of a function of a scalar:
i.e.,
-
(
)
This fits into the pattern set by (73) in that the gradient operator in (73g) is replaced by an advection operator.
Of the last four results, only (73c) is dependent on the order of the ∗ product; the others could equally well be written
-
(
)
The Laplacian of a function of a scalar departs from the above pattern.
where the last line follows from the product rule for ∂i and, in the second term, the chain rule for ∂i . In that second term, the implicit sum ∂i u ∂i u can be recognized as |∇u|2 by (58s). So we have
-
(
)
Multivariate chain rule: The foregoing chain rules involve one intermediate function of one scalar variable. It will be useful to have an elementary chain rule that can handle more than one of each. Let p(r) be a smooth scalar field, and let r in turn be a smooth function of several variables, one of which, say t , is allowed to vary while the others are held constant, so that r changes by dr when t changes by dt. Then dividing (26g) by dt gives
or, in indicial Cartesian coordinates with implicit summation,
or, in traditional longhand,
This is the desired multivariate chain rule for a scalar function of three intermediate real variables. The assumption that these variables are Cartesian coordinates is not a loss of generality, because any three real quantities can be suitably scaled and represented by perpendicular axes, so that any scalar function of them becomes a function of position, to which (26g) applies; and then the scaling can be reversed without changing the products in the last equation. Moreover, by the linearity of ∂t , the scalar field p may be replaced by any field expressible in terms of a uniform basis. For example, for a vector field q ,
where the third line is obtained by applying the multivariate chain rule for a scalar field. Thus, for a generic field ψ ,
-
[ for generic ψ and xi ].
(
)
Gradient ⧸ curl ⧸ divergence of a function of a scaled position vector: We end this subsection by deriving a lemma for use in the next subsection. If k is a uniform scalar multiplier and r is the position vector,
where the third expression is obtained by from the second by multiplying each denominator (change in xi) by k and compensating. But now we have
-
(
)
where the vertical bar and subscript indicate that the gradient, curl, or divergence is evaluated at k r. We shall be interested in the curl (for which ∗ is a cross).
Field with given curl
[edit | edit source]Consider the vector field
-
(
)
where q is a solenoidal vector field and r is the position vector. By identity (67c),
where, by hypothesis, div q is zero. Applying identities (62r) and (64r) then yields
In the special case in which q is the angular velocity ω of a rigid body about an axis through the origin, v is the velocity field (ω × r) and ω is uniform, so that the last result reduces to curl v = 2ω ; that is, the vorticity is twice the angular velocity. As the vorticity in this case is uniform and therefore independent of position relative to the axis, it does not change if the axis is shifted, provided that the angular velocity has the same magnitude and direction. And because a uniform velocity field has zero curl, the vorticity is also unchanged if a translational motion is superposed on the rotation. This is the most direct connection that we have seen between curl and rotation. But again I digress.
Returning to the more general case in which q is not necessarily uniform, but merely solenoidal,[37] we have
to which we can apply our lemma (76) with a uniform real factor t , obtaining
On the left we can recall (77); and on the right we can apply (11), noting that the magnitude of |r| is r , which measures distance in the direction of r. Thus we obtain
Now if the direction of r is held constant, q(t r) is a function of tr ; and in general r ∂r f (tr) = t ∂t f (tr). So we have
Integrating w.r.t. t from 0 to 1 gives
that is,
-
[ for solenoidal q ].
(
)
Thus for any solenoidal vector field q we can construct a vector potential—that is, a field whose curl is q ; such a field is given by the integral on the right. This is the long-promised proof of the "converse" of identity (24d). Of course the vector potential is not unique, because any conservative field—but only a conservative field—can be added to it without changing its curl. Hence the existence of one vector potential implies the existence of infinitely many. The above integral gives us one.
The proof of (78) assumes that q is solenoidal not only at position r , but also at t r where 0 ≤ t ≤ 1, i.e. at every point on the line-segment from the origin to r. A star-shaped region is one that contains an point O such that for every point P in the region, the line-segment OP is entirely contained in the region. We may choose any such O as the origin in the proof of (78). So the proof tells us that if a vector field is solenoidal within a star-shaped region, it has a vector potential in that region. As a special case, a vector field that is solenoidal everywhere has a vector potential everywhere.
Notes on the curl of the curl
[edit | edit source]Identity (65), namely
("curl curl is grad div minus del squared"), has at least three implications worth noting here.
First, it can be rearranged as
-
(
)
("del squared is grad div minus curl curl"). This would serve as a coordinate-free definition of the Laplacian of a vector, if we did not already have one.[38] But we do: we started with a coordinate-free definition (4L) for a generic field, established its unambiguity via (9L), and found its Cartesian form (63L), which we used in the derivation of (79). Wherever we start, we may properly assert by way of contrast that the Laplacian of a vector is given by (79), whereas the Laplacian of a scalar is given by the divergence of the gradient. But we should not conclude, as Moon & Spencer do, that representing the scalar and vector Laplacians by the same symbol is "poor practice… since the two are basically quite different",[39] because in fact the two have a common definition which is succinct, unambiguous, and coordinate-free: the Laplacian (of anything) is the closed-surface integral of the outward normal derivative, per unit volume.[o]
Second, by reason of identity (38) and the remarks thereunder, a given vector field v can be written
where the integral is over all space, or at least all of the space in which v may be non-zero. So, subject to the convergence of the integral, there exists a vector field q such that
that is, by (79), there exists q such that
which implies the existence of a scalar field, say and a vector field, say Ψ, such that
(namely and Ψ = − curl q). In short, subject to the convergence of the said integral,
- a given vector field can be resolved into [minus] a gradient plus a curl.
Such a resolution is called a Helmholtz decomposition, and the proposition that it exists is the Helmholtz decomposition theorem. Of course the gradient is irrotational and the curl is solenoidal so that, subject to the same convergence,
- a given vector field can be resolved into an irrotational field plus a solenoidal field.
This is a second statement of the theorem, and follows from the first. And the first follows from the second because an irrotational field has a scalar potential by (29) and a solenoidal field has a vector potential by (78).
Third, if q is solenoidal, the term ∇ div q in (65) or (79) vanishes. Hence for a solenoidal field, the curl of the curl is minus the Laplacian. For example, in the dynamic case, in a vacuum, the Maxwell–Ampère law says that curl H = Ḋ = ϵ0 Ė. Multiplying this by the physical constant μ0 (called the vacuum permeability or simply the magnetic constant) gives curl B = μ0 ϵ0 Ė , whence
But, by Gauss's law for magnetism, B is solenoidal so that, by (65), the left-hand side of the above is −△B. And by Faraday's law, curl E = −Ḃ , so that . Making these substitutions, we get i.e.
By comparison with (45), this is the wave equation with
Thus the Maxwell–Ampère law, Gauss's law for magnetism, and Faraday's law, with the aid of (65), predict the existence of electromagnetic waves together with their speed.
For these reasons, especially the last, one could hardly overstate the importance of identity (65).
Digression: Proofs from formal products
[edit | edit source]We have seen that Wilson (1901, pp. 150, 152) interprets the divergence and curl as "formal" or "symbolic" scalar and vector products with the ∇ operator. C.-T. Tai, in his 1995 report (pp. 26–9), alleges that this interpretation began with Wilson and not with Gibbs. Here I shall submit, on the contrary, that while the terminology may not be attributable to Gibbs, the concept certainly is.
Later in the same report, Tai confuses the picture by citing the first volume of Heaviside's Electromagnetic Theory (1893), where Heaviside, although his notations for the scalar and vector products differ from those of Gibbs, nevertheless considers the ∇ operator as a factor in such products. Tai continues:
At the time of his writing he [Heaviside] was already aware of Gibbs' pamphlets on vector analysis but Wilson's book was not yet published. It seems, therefore, that Heaviside and Wilson independently introduced the misleading concept for the scalar and vector products between ∇ and a vector function. Both were, perhaps, induced by Gibbs' notations for the divergence and the curl. Heaviside did not even include the word 'formal' in his description of the products.[40]
Whereas it was quite in character for Heaviside to treat an operator that way, the word "independently" would have surprised Wilson and is contradicted by Tai himself, who observes that Wilson's preface acknowledges Heaviside.[41] In Wilson's own words:
By far the greater part of the material used in the following pages has been taken from the course of lectures on Vector Analysis delivered annually at the University [Yale] by Professor Gibbs. Some use, however, has been made of the chapters on Vector Analysis in Mr. Oliver Heaviside's Electromagnetic Theory (Electrician Series, 1893) and in Professor Föppl's lectures on Die Maxwell'sche Theorie der Electricität (Teubner, 1894). ....
Notwithstanding the efforts which have been made during more than half a century to introduce Quaternions into physics the fact remains that they have not found wide favor.[p] On the other hand there has been a growing tendency especially in the last decade toward the adoption of some form of Vector Analysis. The works of Heaviside and Föppl referred to before may be cited in evidence. As yet however no system of Vector Analysis which makes any claim to completeness has been published. In fact Heaviside says: "I am in hopes that the chapter which I now finish may serve as a stopgap till regular vectorial treatises come to be written suitable for physicists, based upon the vectorial treatment of vectors" (Electromagnetic Theory, Vol. I., p. 305). Elsewhere in the same chapter Heaviside has set forth the claims of vector analysis as against Quaternions, and others have expressed similar views.[42]
Most damaging to Tai's thesis, however, is Gibbs's original pamphlet, a copy of which Heaviside received from Gibbs himself in June 1888.[43] Sections 62 to 65 of the pamphlet appear under the heading
∇, ∇⸱ , and ∇ × applied to Functions of Functions of Position.
In § 62, Gibbs says that a constant scalar factor after such an operator may be placed before it (that is, taken outside the operator). In § 63 he states our rule (73g) for the gradient of a function of a scalar field. His next section (in which I have bolded the vector field ω) is worth quoting in full:
64. If u or ω is a function of several scalar or vector variables, which are themselves functions of the position of a single point, the value of ∇u or ∇⸱ ω or ∇ × ω will be equal to the sum of the values obtained by making successively all but each one of these variables constant.
This proposition is a generalized product rule in the sense that the "function of several scalar or vector variables" may be, but is not restricted to, any sort of product of those variables. Gibbs continues:
65. By the use of this principle, we easily derive the following identical equations:
Six "equations" follow. The first says that the gradient operation is distributive over addition, and the second says the same of the divergence and curl (on one line). The last four are our identities (69), (71d), (71c), and (67d), in that order (albeit with different symbols). Gibbs then remarks (with my italics):
The student will observe an analogy between these equations and the formulæ of multiplication. (In the last four equations the analogy appears most distinctly when we regard all the factors but one as constant.) Some of the more curious features of this analogy are due to the fact that the ∇ contains implicitly the vectors i , j , and k , which are to be multiplied into the following quantities.
Indeed, if the first factor is constant, identities (69), (71d), (71c), and (67d) become
whereas if the second factor is constant, they become respectively
All eight equations look like rearrangements of products involving a vector ∇. [Concerning the last three equations, we have made that observation before; see (15) above.] But only seven of the eight are explained by taking the constant outside the operator (as in § 62); the exception is the fourth, in which the minus sign is not explained by that step alone, but is explained by the change in the cyclic order of the formal triple product. And if we add the two right-hand sides corresponding to each of the four left-hand sides, we get the identities in which both factors are variable—as claimed in § 64.
If § 65 leaves any doubt that Gibbs approved of formal products with the symbolic vector ∇ (albeit without using those terms), this is dispelled by § 166, where he writes:
166. To the equations in No. 65 may be added many others…
followed by a list of seven identities terminated by "etc." Six of the seven are beyond the scope of the present paper,[q] while the third of the seven is our (67c). After the list comes the smoking gun (§ 166, continued):
The principle in all these cases is that if we have one of the operators ∇, ∇⸱ , ∇ × prefixed to a product of any kind, and we make any transformation of the expression which would be allowable if the ∇ were a vector, (viz: by changes in the order of the factors, in the signs of multiplication, in the parentheses written or implied, etc.,) by which changes the ∇ is brought into connection with one particular factor, the expression thus transformed will represent the part of the value of the original expression which results from the variation of that factor.
The italics are mine, but I have refrained from italicizing those instances of the word "factor" which are not applicable to ∇. In particular, at the stage when "the ∇ is brought into connection with one particular factor," the "part of the value… which results from the variation of that factor" evidently means the term of the sum in § 64 —which, as we have noted, amounts to a generalized product rule. But, according to the stated "principle', we reach that stage by treating ∇ as a factor. I rest my case.
Wilson (1901, p. 157) gives a comprehensive list of sum and product rules for the gradient, divergence, and curl, and properly states (p. 158) that the rules may be proven "most naturally" from Gibbs's definitions of the operators—our equations (58g), (60d), and (59c). Understandably, Wilson uses a ∑ sign rather than implicit summation. Less understandably, and less fortunately, he does not sum over a numerical index; e.g., he defines the curl operator as
- [sic]
and explains that "The summation extends over x, y, z." With these definitions he proves our identities (71c) and (68) essentially as we have done, but inevitably with greater difficulty, which may explain why he then says "The other formulæ are demonstrated in a similar manner" before reverting to Gibbs's strategy of varying one factor at a time. He announces (p. 159) that the variable held constant will be written as a subscript after the product, and he combines this notation with his ∑ notation in a rigorous proof that varying one factor at a time is valid for our (68), i.e. the gradient of a dot-product. Noting that this result is analogous to
he then jumps to the conclusion that varying one factor at a time is valid for all of his product rules—notwithstanding that a small change in a vector is not related to its divergence or curl as a small change in a scalar is related to its gradient.
That per saltum conclusion is his cue to go formal and symbolic. To obtain the curl of a cross-product [as in our (67c)], he "formally" expands a vector triple product to obtain the curl when the first factor is constant, states the curl when the second factor is held constant, and adds the two partial curls (Wilson, 1901, p. 161). Next he gives various arrangements of our (8q), except that he presents the first vector not as strictly uniform, but as merely held constant for the gradient operation. He states in passing that a proof may be effected by "expanding in terms of i , j, k"; but instead of such a proof, he offers a "method of remembering the result" by expanding the "product" u × (∇ × v) "formally as if ∇, u , v were all real vectors" (pp. 161–2). Concerning the curl of the gradient, and the divergence of the curl (pp. 167, 168), he recommends expanding in terms of i , j, k , but does not elaborate. Concerning the curl of the curl, however, he shows what would happen if it were "expanded formally according to the law of the triple vector product" (p. 169).
In defense of the "formal product" method, we should note that the operators ∂x , ∂y , and ∂z are linear, so that they are distributive over addition and may be permuted with multiplication by a constant, as if the operators themselves were multipliers (like components of vectors). They may be similarly permuted with other like operators—explaining why the formal-product method correctly deals with the curl of the gradient, the divergence of the curl, and the curl of the curl. But such an operator cannot be permuted with multiplication by a variable, because then the product rule of differentiation applies, yielding an extra term. The formal-product system responds to this difficulty by generalizing the product rule as in §§ 64 & 166 of Gibbs (1881–84). As Borisenko & Tarapov put it (1968, p. 169),
the operator ∇ acts on each factor separately with the other held fixed. Thus ∇ should be written after any factor regarded as a constant in a given term and before any factor regarded as variable.
In this they differ inconsequentially from Gibbs, who requires that the operator be "brought into connection" with the factor considered variable.
To illustrate, let us find the gradient of a dot-product, essentially in the manner of Borisenko & Tarapov (1968, p. 180), quoted by Tai (1995, p. 46; the next five equation numbers are Tai's). In this case the generalized product rule gives
-
(
)
where the subscript c marks the factor held constant during the differentiation. In Wilson's notation, this equation would be written
where a trailing subscript indicates which factor is held constant. In the Feynman subscript notation, the subscript is attached to the ∇ operator and indicates which factor is allowed to vary, so that the same equation would be written
But, as we are discussing Borisenko & Tarapov, we press on with (7.26). By the algebraic identity
-
(
)
i.e.
we can say
-
(
)
Similarly,[44]
-
(
)
Substituting (7.28) and (7.29) into (7.26), in which the order of the dot-products is immaterial, and dropping the c subscripts (because they are now outside the differentiations), we get the correct result
-
(
)
corresponding to our (68).
Tai (1995, p. 47) is unimpressed, asking why we cannot apply (7.27) directly to the left side of (7.26). The answer to that is obvious: on the left side, the ∇ operator is applied to a product of two variables, and the variations of both must be taken into account. But there is a harder question which Tai does not ask: in (7.28), why can't we have ∇⸱Ac instead of Ac⸱∇ ? (Or, in terms of Feynman subscripts, why can't we have ∇B ⸱ A instead of A⸱∇B?) Because that would make the term vanish? Yes, it would; but, as there is only one variable factor on the left side, why do we need two terms on the right? Because the rule says ∇ should be written after the constant but before the variable? Yes, but that rule serves the purpose of varying each variable, whereas there is only one variable to vary on the left of (7.28). The same issue arises in (7.29). We cannot settle the question even by appealing to symmetry. Obviously the right side of (7.30), like the left, must be unchanged if we switch A and B; and indeed it is. But if the first term on the right of (7.28) and of (7.29) were to vanish, the necessary symmetry of (7.30) would be maintained. And unless I'm missing something, Tai's "symbolic vector" method does not circumvent the problem; Tai's "Lemma 2" (1995, p. 53) is the Gibbs⧸Wilson method of "varying one factor at a time", written with Feynman subscripts attached to the symbolic vector instead of the del operator.[r]
For another example of the same issue, consider the following two-liner offered by Panofsky & Phillips (1962, pp. 470–71) and rightly pilloried by Tai (1995, pp. 47–8):
If the first line were right, the authors would hardly bother to continue; but evidently it isn't, because it doesn't begin by "varying one factor at a time". The second line does not follow from the first and includes divergences of constants, which ought to vanish but somehow apparently do not. Let's try again, this time sticking to the rules:
in agreement with our (67c). Here the first line comes from the generalized product rule, and the third is obtained from the second by rearranging terms and dropping the (now redundant) subscripts. The interesting line is the second, which is obtained from the first by expanding the formal vector triple products. But again, why must we have Ac⸱∇ and Bc⸱∇, instead of ∇⸱Ac and ∇⸱Bc , which would make the middle two terms vanish? Again symmetry does not give an answer. The right-hand side, like the left, must change sign if we switch A and B ; but the disappearance of the Ac⸱∇ and Bc⸱∇ terms would maintain the required (anti)symmetry. Funnily enough, the result would then agree with the incorrect first line given by Panofsky & Phillips (above). But then how would we know that it is incorrect?
The foregoing examples show that "formal product" arguments can be tenuous, even on their own terms. Before these examples, we might have been troubled by the omission of a general proof of the "generalized" product rule. After them, we might wonder whether the rule is even well defined.
I submit, however, that none of this matters. I submit that the popularity of using "formal products" with the del operator, in derivations of vector-analytic identities, is a reaction to the failure of early writers to use indicial notation in the Cartesian definitions of differential operators.[s] The ensuing proliferation of terms in coordinate-based derivations led authors to seek shortcuts through "formal products" when more rigorous but no-less convenient shortcuts could have been taken through indicial notation, especially in combination with implicit summation. Our derivation of the gradient of a dot-product (68) is shorter than that of Borisenko & Tarapov, and even uses the right-hand sides of their identities (7.28) and (7.29), but obtains them rigorously with no ambiguity and no c subscripts. Our derivation of the curl of a cross-product (67c) takes six lines with a single column of "=" signs. Our subsequent formal-product derivation (not to be confused with the attempt of Panofsky & Phillips) seems to take only three lines; but it is only through our earlier indicial derivation that we have any confidence in our result (not to be confused with the result of Panofsky & Phillips). Our other indicial derivations of identities are mostly shorter than the two just mentioned. Having amassed so comprehensive a collection of identities so rigorously with so little effort, I submit that the use of formal products, Wilson subscripts, c subscripts, and Feynman subscripts for this purpose is a historical aberration, to be deciphered in other people's writings but avoided in one's own.
That being said, it is one thing to conclude, as Tai duly does, that the del-cross and del-dot notations should not be interpreted as products in derivations and proofs, and another thing to allege, as Tai also does (1995, p. 22), that ∇⸱ and ∇ × are "not compound operators" but only "assemblies", or in other words that " ∇ is not a constituent of the divergence operator nor of the curl operator." Against the latter proposition, our equations (14), (61o), and (62o) have been derived, not merely defined, and our derivation of (14) is as general as we could wish. Moreover, whereas (61o) and (62o) are for Cartesian coordinates, we shall see that they have counterparts in more general coordinates.
General coordinates
[edit | edit source]From our initial definitions of the differential operators, we derived certain identities, from which we derived expressions for the operators in Cartesian coordinates, from which we derived a comprehensive collection of identities, two of which (the multivariate chain rule, and the curl of the product of a scalar and a vector) will now be useful for expressing the operators in other coordinate systems. Cartesian coordinates are traditionally called x, y, z, which we renamed xi where i = 1, 2, 3 , respectively. The best-known 3D non -Cartesian coordinate systems are the cylindrical coordinates (ρ, φ, z) and the spherical coordinates (r, θ, φ); we have already seen r in the guise of the magnitude of the position vector r. But now we want our coordinate system to be as general as possible—with the Cartesian, cylindrical, and spherical systems and many others, and even classes of systems, as special cases.
Natural and dual basis vectors
[edit | edit source]We shall call our general coordinates ui where i = 1, 2, 3 ; yes, for reasons which will emerge, we shall write the coordinate index as a super script. But we shall write ∂i for ∂/∂ui , relying on context to distinguish it from the special case ∂/∂xi . By describing the ui as coordinates we mean two things. First, for some domain of interest, the position vector is a smooth function
which possesses partial derivatives w.r.t. its arguments. Second, for every position vector in the resulting range, there is only one ordered triplet (ui ) = (u¹, u², u³), so that we can think of each coordinate as
—that is, we can think of each ui as a scalar field, which possesses a gradient.[t] (I say "think of" because ui, being obviously dependent on a coordinate system, would not normally be considered a true scalar; but sometimes we need to treat the coordinate system itself as an object under study.)
These two properties of coordinates respectively suggest two simple ways of choosing basis vectors related to the coordinates: we shall define the natural basis vectors as
-
(
)
and the dual basis vectors as
-
(
)
(We could normalize the natural basis vectors by dividing them by their magnitudes to obtain unit vectors; but, for the moment, we won't bother.) Just as we may think of each ui as a scalar field and inquire after its directional derivative or its gradient or its Laplacian, so we may think of each hi or hi as a vector field and inquire after its directional derivative or its curl or its divergence or its Laplacian. (That the curl of hi is zero will be especially useful.)
In Cartesian coordinates, hi and hi are both equal to the unit vector ei ; thus, in Cartesian coordinates, the natural basis vectors are their own duals. In general coordinates, hi and hi may differ in both direction and magnitude and are not generally unit vectors. Nevertheless, even in general coordinates, there is a simple relation between the natural and dual basis vectors. Consider the dot-product
If i ≠ j , then ∂i r , being in a direction in which ui varies while each other u j does not, is tangential to a surface of constant u j and therefore normal to ∇u j, so that the dot-product is zero. But by (26g),
and if we vary r by varying ui while holding each other u j constant, we can divide by dui and obtain
-
[with no summation].
(
)
Putting the two cases together, we have
-
(
)
where the right-hand function, known as the Kronecker delta function, is defined by
-
(
)
Obviously the function is symmetric: the indices i and j can be interchanged. If two lists of vectors are related so that the dot-product of the i th vector in one list and the j th in the other is δij , the two lists are described as reciprocal. Thus the triplets (hi) and (hi) are reciprocal bases: the dual basis is the reciprocal of the natural basis and vice versa. Hence, taking the natural basis as a reference, the dual basis is sometimes called "the" reciprocal basis.
In Cartesian coordinates, (81) becomes
So we have a relation for general coordinates (81) which is just as simple as its special case for Cartesian coordinates, provided that we use the natural basis for one factor and the dual basis for the other. This will be a recurring pattern.
We have deduced the reciprocity relation (81) from prior definitions of the natural basis (hi) and the dual basis (hi). This result has a partial converse, in that a reciprocity relation between bases is enough to define either basis in terms of the other—as we shall see later. But first we proceed to components of vector fields.
Contravariant and covariant components
[edit | edit source]A coordinate grid is a set of intersecting curves such that on each curve, one coordinate varies while the others are constant. If we could embed such a grid in an elastic medium, and then stretch and rotate the medium, the natural basis vectors hi given by (80a) would stretch and rotate with the medium and with the grid. Accordingly, the natural basis is also called the covariant basis. But according to (81), the dot-product of a natural basis vector and a dual basis vector is invariant (independent of the coordinate system), so that the variation of one factor compensates for the variation of the other. So, as the natural basis is "covariant" with the coordinate grid, we say that the dual basis is contravariant. Notice that the co variant factor has a sub script index (easily remembered because "co rhymes with low ") whereas the contra variant factor has a super script index, and that one kind of variation must combine with the other in order to produce an in variant result; these will be recurring patterns.
A vector field q may be expressed in components w.r.t. the natural (covariant) basis as
-
(
)
with summation, or in components w.r.t. the dual (contravariant) basis as
-
(
)
with summation. If q is to be invariant (a true vector, existing independently of the coordinate system), the components must be contravariant in the former case and covariant in the latter, and accordingly are written with superscripts and subscripts respectively. In Cartesian coordinates, the two bases are the same, so that the components w.r.t. the two bases are also the same; that's why, in the above section on Cartesian coordinates, we got away with writing component indices as subscripts. In general coordinates, however, the basis vectors have subscripts and the components have superscripts or vice versa, so that the index of implicit summation appears once as a superscript and once as a subscript.
Taking dot-products of (83a) with h j, applying (81), and noting that only one term on the right is non-zero, we obtain
-
(
)
Similarly, taking dot-products of (83b) with hj yields
-
(
)
These results depend on the reciprocity relation (81) but not on the earlier definitions of the bases to which that relation applies. They say:
- to find the contravariant components of a vector, take its dot-products with the contravariant basis vectors, and
- to find the covariant components of a vector, take its dot-products with the covariant basis vectors;
or, in terms of the bases themselves:
- to find the components of a vector w.r.t. either basis, take dot-products of that vector with the other basis.
If a particular ui has a particular name, such as θ or φ, then, if we're not using indexed summation, we may find it convenient to write that name in place of the index i in the superscript or subscript.
At the present level of generality, the basis vectors hi , unlike their Cartesian counterparts ei , are not assumed to be uniform (i.e., homogeneous). One consequence of this general non-uniformity (inhomogeneity) is that, although we can say r = xi ei in Cartesian coordinates and q = qi hi in general coordinates, we cannot say
- [sic! ]
in general coordinates. For example, we have seen that in spherical coordinates the position vector r is simply r r ̂ = r hr ; it is not r hr + θ hθ + φ hφ , because θ and φ are encoded in the direction of hr . Similarly, in cylindrical coordinates the position vector r is ρ hρ + z hz ; it is not ρ hρ + φ hφ + z hz , because φ is encoded in the direction of hρ . In both examples, encoding one coordinate in the direction of another coordinate's unit vector is circular in that the said direction depends on the position vector, which is the very thing that we want to represent.
A non-uniform basis is not a global basis. It cannot give a uniform representation of a uniform vector field, because the standard of representation changes; it is like having a compass whose orientation varies from place to place and⧸or a measuring stick whose length varies from place to place. But it can serve as a local basis —as in (83a) and (83b), each of which expresses a vector field at a given location in terms of a basis at that location, notwithstanding that the basis may be different at other locations. And although a local basis (as we have just seen) cannot generally represent the position vector in a non-circular manner, it can represent a change in the position vector. By the generality of the multivariate chain rule (75),
Multiplying by dt we get
-
(
)
or, substituting from (80a),
-
(
)
Thus the small changes in the coordinates ui are the components of the true vector dr w.r.t. the covariant basis. That means the changes in the coordinates must be contravariant. Here at last is the explanation why we write general coordinates with superscript indices. And again the point is moot for Cartesian coordinates, for which the covariant basis is also contravariant.
Since dui is contravariant, ∂i r in (84) must be covariant in order to yield the true vector dr. This vindicates our decision to write ∂i with a subscript. Recall, however, that ∂i means ∂/∂ui . Thus the derivative w.r.t. the contravariant quantity is covariant —wherefore it is said that a superscript in the denominator of a derivative counts as a subscript in the derivative as a whole.
In (85), the general term hi dui (not the sum) is the displacement of r due to the small change dui in the coordinate ui. The three such displacements of r make concurrent edges of a parallelepiped whose signed volume is
that is,
-
(
)
where
or, to use a standard abbreviation for the scalar triple product,
-
(
)
J is called the Jacobian of the natural (covariant) basis. We describe the basis and the associated coordinate system as right-handed if this Jacobian is positive, and left-handed if this Jacobian is negative. Thus the handedness depends on the standard order in which we write the vectors; e.g., the standard Cartesian basis is right-handed because we write it as (i, j,k) but would be left-handed if we wrote it as (i,k, j).
If the covariant basis is indeed a basis, its member vectors must not be coplanar; that is, J must not be zero. Hence, if the covariant basis is to be a local basis in some region of interest, J must not vanish anywhere in that region, and therefore must have the same sign throughout the region; that is, the handedness of the coordinate system must be the same throughout the region.
Properties of reciprocal bases
[edit | edit source]We have noted that formulae (83c) and (83d), for the components of a vector w.r.t. the covariant and contravariant bases, depend only on the reciprocity relation (81) between the bases. Now, retaining the designations "covariant" and "contravariant" for convenience, let us see what else we can deduce from that relation.
Most obviously, the reciprocity relation leads to a simple component-based expression for the dot-product of two vector fields, say v and q , provided that we use the contravariant components and covariant basis (83a) for one vector, and the covariant components and contravariant basis (83b) for the other:
whence selecting the non-zero terms gives
-
(
)
And the two vectors, being general, can swap roles in (83a) and (83b):
-
(
)
The cross-product needs a bit more preparation. First we define the permutation symbol ϵijk or ϵijk (also called the Levi-Civita symbol) as having the value +1 if (i, j, k) is a permutation of (1, 2, 3) in the same cyclic order, −1 if (i, j, k) is a permutation of (1, 2, 3) in the reverse cyclic order, and 0 if (i, j, k) is not a permutation, i.e. if there is at least one repeated index. To put it more formally,
-
(
)
Note that because switching any two indices changes the cyclic order, switching any two indices changes the sign of the permutation symbol. Now by (81), h1 is perpendicular to both h2 and h3. So we can say
where α1 is a real variable to be determined. Taking dot-products with h1 and applying (81) and (87), we find that α1 = J , so that
-
(
)
By the generality of the vectors we can rotate the three indices, but the sign of the left-hand side changes if we swap the two indices on the left. All six cases are covered by
-
(
)
Here we want only one term; but we need not specify "no sum", because for given i and j the permutation symbol leaves only one non-zero term in the sum over k. In words, this result says that the cross-product of two covariant basis vectors, with their indices in the standard cyclic order, is the Jacobian times the contravariant basis vector with the omitted index. Similarly, or rather reciprocally,
-
(
)
where J′ is the Jacobian of the contravariant basis.
Equations (90a) and (90b), which we have obtained from the reciprocity relation (81), can be solved for hk and hk respectively; but now we do suppress the implicit sum, because k is "given" instead of i and j :
-
[distinct i, j, k in cyclic order];
(
)
-
[distinct i, j, k in cyclic order].
(
)
Thus a reciprocity relation between bases is enough to define either basis in terms of the other—as claimed above. If it is not convenient to suppress an implicit sum, the last two results can instead be written
-
(
)
and
-
(
)
where the factor 2 in each denominator is needed because the right-hand side has two equal non-zero terms—the sign of the permutation symbol compensating for the order of the cross-product.
Now we're ready to consider the cross-product of two vector fields. In terms of the covariant basis,
i.e.,
-
(
)
On the right, the two components and the basis vector are contravariant, but invariance is achieved by multiplying by the covariant Jacobian (which has three covariant factors). Similarly,
-
(
)
On the right of (91a) or (91b), the implicit triple summation has 27 terms, of which only six—corresponding to the six possible permutations of the three possible indices—can be non-zero. Thus the factor following the Jacobian can be recognized as the familiar determinant whose columns (or rows), in cyclic order, are the components of v , the components of q , and the three basis vectors. In Cartesian coordinates, in which the Jacobians are equal to 1 and we don't need the co⧸contra distinction, both equations reduce to
—a familiar result written in a possibly unfamiliar way.
The Jacobian of the contravariant basis is
or, if we substitute from (90c),
In the numerator, the cross-product of cross-products can be read as a vector triple product in which the first factor is a cross-product. Expanding that triple product and noting that one term is a scalar triple product with a repeated factor, we get
so that we may write
-
(
)
in (90b), (90d), (90f), and (91b). In words, the Jacobian of the reciprocal basis is the reciprocal of the Jacobian of the original basis. Therefore the two Jacobians have the same sign. Therefore a basis is right-handed if and only if its reciprocal is right-handed. Thus the natural and dual bases of a coordinate system have the same handedness, and the handedness of either may be identified with the handedness of the coordinate system.
The gradient, del, and advection operators
[edit | edit source]Let p be a scalar field, and let s be arc length in the direction of the unit vector ŝ. By the multivariate chain rule (75),
So hi∂i p is the vector whose (invariant) scalar component in the direction of any ŝ is the directional derivative of p in that direction; that is,
-
(
)
or, in operational terms,
-
(
)
Apart from the need to pair a superscript with a subscript, these two results look as simple as their Cartesian special cases (58g) and (58o).
If ψ is a generic field and q is a general vector in the direction of the same s then by definition (11),
that is,
-
(
)
or, in operational terms,
-
(
)
These results likewise look as simple as their Cartesian special cases (64) and (64o). And by (88a), the q⸱∇ operator again turns out to be the formal dot-product of q and ∇.
The curl and divergence operators
[edit | edit source]To express the curl of a vector field q , we choose the contravariant basis (83b) and apply identity (71c):
On the right, the first term vanishes because h j is ∇u j (and the curl of a gradient is zero). Substituting from (93o) in the second term, we obtain
or, using (90b),
-
(
)
or, in a more familiar form,
Formula (95c) agrees with a result obtained by Tai with his "symbolic vector" method.[45] It is also what we would get by naively using (91b) to evaluate ∇ × q ; it comes out so simply because each contravariant basis vector h j is the actual gradient of u j and not (e.g.) merely a unit vector in the same direction (remember that qj is the component w.r.t. h j , not hj).
But (95c) does not end in a subexpression for the operand q and therefore does not directly yield an expression for the curl operator. To find this operator and the divergence operator, we return to the original definitions (4g), (4c), and (4d), noting that they can be combined as
-
(
)
where ∗ may be a null for the gradient, a cross for the curl, or a dot for the divergence.[u] Recalling that the value of this expression does not depend on the shape of dS , let dS be the parallelepiped defined by the six equicoordinate surfaces at ui and ui+dui, so that dV is given by (86). Then the contribution to the integral from the face at u¹+du¹ is
where the square brackets and subscripting mean "evaluated at". This can be written
or, by (90.1),
Similarly, the contribution from the face at u¹ (where h1 points inward instead of outward) is
The sum of the contributions from the two opposite faces can then be written
so that when we add in the contributions from the other two pairs of opposite faces, the entire integral becomes
(with implicit summation over i). Substituting this and (86) into (96), we get
-
(
)
Now applying the product rule gives
-
(
)
Here the left-hand side is ∇ ∗ψ according to our original volume-based definition (4g) of the ∇ operator—which is known to yield the curl or the divergence if ∗ is a cross or a dot, respectively—whereas the first term on the right is what we would get for ∇ ∗ψ by using our latest definition (93o) of the ∇ operator and allowing ∂i to "pass by" the star in the Wilsonian manner. So, if we can show that the second term on the right is zero, we shall have established the precise sense in which the del-cross and del-dot notations are valid in general coordinates. In that second term, by (90e),
where the last line follows by (80a). But the order of partial differentiation can be switched. So, in the sum over the permutations, for each term in ∂i ∂j r there is an equal term in ∂j ∂i r to which the permutation symbol attaches the opposite sign, so that the terms in ∂i ∂j r cancel. Similarly the terms in ∂i ∂k r cancel. Thus, as anticipated, the second term in (98) is zero and we have
-
(
)
If ∗ is a null and ψ is a scalar field p , then (99) becomes (93g) and thus (fortunately!) confirms (93o) as the form of the del operator in general coordinates.
Now let ψ be a vector field q . If ∗ is a cross, then (99) becomes
-
(
)
or, in operational terms,
-
(
)
If instead ∗ is a dot, (99) becomes
-
(
)
or, in operational terms,
-
(
)
But if we take the ∇ operator as given by (93o) and try to construct the curl and divergence operators (in the same coordinates) as ∇ × and ∇⸱ respectively, we get hi ∂i × and hi ∂i ⸱ respectively [compare (61o) and (62o)]; and if we then let ∂i "pass by" the cross and the dot, we get (100o) and (101o), or (100c) and (101d) if we include the operand q . Thus the del-cross and del-dot notations work in general coordinates.
Equations (100c) to (101o) are apparently due to Tai (1995, eqs. 9.39, 9.40, 9.34, & 9.35, and text on p. 66), who derives them, along with the corresponding form of the del operator (his eq. 9.33), from volume-based definitions expressed in his "symbolic vector" notation. But he does not point out that the curl and divergence operators are obtainable from that del operator, as del-cross and del-dot, via the same "pass by" step that he condemns in the Cartesian context. Speaking of which, we should note that our equations (100c) to (101o), apart from the need to pair a superscript with a subscript, are as simple as their Cartesian special cases (59c), (59o), (60d), and (60o).
In (100c) and (101d), it goes without saying that ∂i q must be evaluated correctly—in particular, that if the operand is expressed in terms of non-uniform basis vectors, the non-uniformity must be taken into account. Formula (95c), for the curl, does not suffer from this complication, because it is already expressed in components w.r.t. the contravariant basis (whose non-uniformity has already been taken into account). To obtain a similarly convenient formula for the divergence, we use components w.r.t. the co variant basis (i.e., contravariant components): in (97), if ∗ is a dot and ψ is a vector field q , we have
or, by (83c),
-
(
)
This too agrees with Tai (1995, p. 65, eq. 9.37).
The Laplacian
[edit | edit source]For a scalar operand, applying (101o) and reversing the "pass by", we find that the Laplacian operator is
And by the linearity of the Laplacian, the ∇2 formulation remains valid if the operand is a fixed linear combination of scalars—including a vector field, because that is expressible (even if not actually expressed) w.r.t. a uniform basis. (And if it is expressed in terms of a non-uniform basis, the non-uniformity must be taken into account in differentiations.)
In what follows, however, we shall find it convenient to take a different approach. If ψ is a scalar field, its gradient as given by (93g) is h j∂j ψ , of which the i th contravariant component is hi⸱ h j∂j ψ , which takes the place of qi in (102d), so that the divergence of the gradient of ψ is
-
(
)
This remains well-defined if ψ is a generic field (although we still need to deal with any non-uniformity of the basis in which ψ might be expressed).
Affine coordinates
[edit | edit source]If a basis is uniform (homogeneous), so is its Jacobian. Hence, by (90c) and (90d), the dual (contravariant) basis is uniform if and only if the natural (covariant) basis is uniform. A coordinate system in which these bases are uniform is described as affine. In affine coordinates,
- by (80a), ∂i r is uniform, so that the curves on which only one coordinate varies are straight parallel lines; and
- by (80b), ∇ui is uniform, so that the level surfaces of each coordinate (being perpendicular to ∇ui) are parallel planes.
Obviously Cartesian coordinates are affine; but one can also construct affine coordinate systems in which the three vectors of each basis are not mutually perpendicular and⧸or the coordinates have different scales or different units.
We have noted above that the correct application of the del-cross, del-dot, and del-squared notations must allow for non-uniformity of the basis vectors. Obviously this issue does not arise in affine coordinates, including Cartesian coordinates. Hence, while these notations are not (as is sometimes alleged) invalid in other coordinate systems, it would be fair to say that they are safer and more convenient in affine coordinates, including Cartesian coordinates.
Orthogonal coordinates
[edit | edit source]We know, e.g. from (90a) and (90b), that if two bases are reciprocal, the cross-product of the i th and j th members of one basis is collinear with the k th member of the other, if i, j, k are distinct. But if the first basis is orthogonal (that is, if its three member vectors are mutually orthogonal), the same cross-product is also collinear with the k th member of the same basis, so that corresponding members of the two bases are collinear. It follows that the natural basis of a coordinate system is orthogonal if and only if the dual basis is orthogonal. And if the bases are orthogonal, the coordinate system itself is said to be orthogonal.
Cartesian coordinates are obviously both affine and orthogonal, and we have already implied that there is a class of coordinate systems that are affine but not orthogonal. The most widely-used class of non-Cartesian systems, however, contains the systems that are orthogonal but not affine; this class, of which the cylindrical and spherical systems are the best-known members, is the class of curvilinear orthogonal coordinates. But we shall drop the word curvilinear in order to include Cartesian coordinates as a special case.
In orthogonal coordinates, expressing a member of one basis in terms of its reciprocal basis is especially simple because corresponding members of the two bases are collinear, wherefore we can say
where βi is a real variable to be determined (and the single index on the left-hand side means no summation). Substituting this into (81i) gives
where
-
(
)
so that
-
(
)
And substituting that into (83c), and comparing the result with (83d), we get
-
(
)
Comparing (104) with definition (80a), we see that hi is the magnitude of ∂i r. Accordingly hi is called the scale factor associated with the coordinate ui ; it is the factor by which we multiply a small change in ui to obtain the magnitude of the consequent change in position.[v]
If we now define
-
(
)
then, due to the orthogonality, (87) and (92) are respectively reduced to
-
(
)
and
-
(
)
—although, for brevity, we shall sometimes leave things in terms of J.
At this point, we could substitute (105) and (106) into earlier equations and obtain a suite of formulae for the differential operators in terms of the covariant basis and co variant components! But we can avoid this confusing breach of convention by normalizing the basis vectors.
An orthonormal basis is one whose members are mutually orthogonal unit vectors. The assumption of unit vectors is introduced so late because it is more useful with orthogonality than without. If one basis consisted of unit vectors that were not all orthogonal, then the reciprocal basis vectors given by (90c) or (90d) would not all be unit vectors.[w] But if the basis (hi) consists of orthogonal unit vectors, equation (105) implies that the reciprocal basis consists of the same vectors; and the converse is also true, by the symmetry of the reciprocity relations. Thus an orthonormal basis is its own reciprocal. Hence, if we choose an orthonormal basis, we do not need superscripts to distinguish the reciprocal basis from the original, or to distinguish components w.r.t. the latter basis from those w.r.t. the former.
An orthonormal basis is not generally covariant, because it doesn't stretch with the coordinate grid (although it does rotate with the grid). Neither is it generally contravariant, because its reciprocal (i.e. itelf) is not generally covariant. Hence, if a non -orthonormal natural or dual basis of an orthogonal coordinate system is normalized (replaced by unit vectors in the same directions), the resulting orthonormal basis is not covariant or contravariant, and components with respect thereto are not contravariant or covariant, and the new basis vectors are not given in terms of the coordinates by (80a) or (80b); the basis is therefore described as a non-coordinate basis. By default, the indices of the orthonormal basis vectors and associated components are written as subscripts, but these are not indicative of covariance. The coordinates themselves remain contravariant (e.g., if the grid dilates, the same movement in space corresponds to smaller changes in the coordinates); but, for want of covariant basis vectors to pair them with, we tend to write the coordinates with subscripts when the basis is orthonormal.
Nevertheless, it is convenient to have one basis instead of two. Moreover, the components of a vector w.r.t. an orthonormal basis are physical components: they have the same dimension (same units) as the represented vector, and they are the components that we would have in mind if we wanted to measure the "components" in the directions of the basis vectors. Hence an orthonormal basis is called a physical basis. Accordingly, it is indeed common practice to normalize the basis vectors of orthogonal coordinate systems. This together with the prevalence of such coordinate systems helps to account for the familiarity of subscripts as indices, and for the jarring unfamiliarity of superscript indices when general (possibly non-orthogonal) coordinates are encountered for the first time.
To normalize the covariant basis, let ĥi (as usual) be the unit vector in the direction of hi . Then, by (104),
-
(
)
(again with no summation, due to the single index on the left). Hence (105) becomes:
-
(
)
A vector field q is expressed in components w.r.t. the basis (ĥi) as
-
(
)
(with summation), where
-
(
)
(Here the hat on q̂i is needed to distinguish the coefficient of ĥi from the coefficient of hi, and indicates that q̂i is the coefficient of a unit vector—not that q̂i has unit magnitude.) Taking qi as given by (83d) and applying (110) and (113), we get
-
(
)
whence (106) gives
-
(
)
Equation (110) quantifies the non-covariance of the orthonormal basis; substituting (110) into (85), we find that the components of dr with respect to ĥi are not simply dui, but hi dui [no sum]. So, as the coordinates ui are still contravariant, the orthonormal basis vectors ĥi are not covariant unless the scale factors hi are equal to 1 — that is, unless the coordinates are Cartesian (except possibly for the handedness). And in Cartesian coordinates we can use subscripts throughout. This is another reason why, when using an orthonormal basis, we might as well write the coordinates as ui .
We can now re-express dot- and cross-products w.r.t. the orthonormal basis (ĥi). If we apply (114) and (115) in (88a) or (88b), the scale factors cancel and we are left with
-
(
)
as if the coordinates were Cartesian. And if we apply (108), (115), and (111) in (91a), the product of the scale factors cancels and we are left with
again as if the coordinates were Cartesian, except that the handedness symbol ς gives a change of sign for left-handed coordinates. The last result is confirmed by applying (109), (114), and (110) in (91b). It can also be written
-
(
)
We can similarly re-express the first-order differential operators. Applying (111) in (93o), (101o), and (100o) gives respectively
-
(
)
-
(
)
and
-
(
)
And applying (115) in (94o) and (102d) gives
-
(
)
and
-
(
)
And applying (109), (110), and (114) in (95c) gives
or, in determinant form,
-
(
)
For the Laplacian, applying (111) twice in (103L) gives
where the parenthesized dot-product is simply δij . Selecting the non-zero terms, we are left with
-
(
)
Working entirely within the coordinates ui , we can use equations (118), (121) to (123), and (124) for scalar ψ , provided that we know the scale factors in terms of ui .[x] And we can find the scale factors in terms of ui if we know the Cartesian coordinates xi in terms of ui. For then the position vector can be written
- r = xj ej ,
whence
- hi = ∂ui r = ∂ui xj ej ,
so that the scale factors can be found from
- hi2 = ∑ j (∂ui xj)2.
In (121) to (123), the hat on q̂i was needed because we treated the orthonormal basis as a special case, having used a hatless qi in less special cases; the hat would not have been needed if we had assumed an orthonormal basis at the outset. In more elementary introductions to curvilinear orthogonal coordinates, the basis vectors are indeed chosen as unit vectors and consequently as orthonormal vectors. Hence, if the coordinates are called and the respective basis vectors are called (understood to be unit vectors), the components of the vector q w.r.t. that basis are called with no hats. In this notation, in which sums are written out longhand without numerical indices, it is convenient also to write out the Jacobian in full, in order to exploit cancellations of scale factors. If the Jacobian appears in both a numerator inside parentheses and a denominator outside, the handedness symbol ς also cancels. Thus the equations numbered (116) to (124) can be rewritten as, respectively,
|
( ) |
Only in the cross-product and the curl does the handedness factor ς make any difference. If the system is right-handed—as is also often assumed at the outset—this factor is replaced by 1.
For some readers, equation group (125) will announce a return to familiar territory. For the writer, it offers a convenient place to stop.
Appendix: Mathematizing Huygens' principle
[edit | edit source]If a wavelike disturbance originating outside a region V, bounded by a surface S , enters the region, it must do so through the surface S. Unless we believe in "action at a distance", we must conclude that the behavior of the wave function throughout the region is fully determined by its behavior on the bounding surface. That reasoning, being qualitative, does not tell us precisely what aspects of the behavior at the boundary determine the behavior throughout the region, or how. In this appendix, we shall answer these questions using tools of vector analysis—some of which we shall find ready-made, and some of which require assembly. Our aim is to express the wave function in the region V as a surface integral, over the bounding surface S , of an integrand related to the wave function incident at a general point on that surface.[46]
Huygens' principle asserts not only that the behavior of the wave function throughout the region (containing no sources) is determined by the behavior at the boundary, but also that the behavior at the boundary is equivalent to a distribution of sources over the boundary, so that the wave function throughout the region is as if the original sources outside the region (primary sources) were replaced by sources distributed over the boundary (secondary sources).[y] We shall find that by appropriately arranging the integrand for the wave function inside the region, we can indeed recognize the distribution of boundary sources that would generate the wave function. We shall also find that the distribution of secondary sources has an alternative description which is especially convenient in the case of a single monopole primary source.
Hints
[edit | edit source]If the surface integrand represents a secondary source density, it will not only be related to the primary wave function at a general point on S, but will also be delayed by the propagation time from that point to the observation point, and attenuated in accordance with the propagation distance. Hence, if ψ(r, t) is the primary wave function and r′ is the position of the observation point (field point) at a distance s from position r, then the integrand (or at least the dominant term thereof) should be proportional to
-
1/ s [ψ] ,
(
)
where the square brackets indicate that the contents are to be delayed (or, in older literature, "retarded") by s⧸c relative to the default arguments (r, t), so that
-
[ψ] = ψ(r, t − s⧸c) .
(
)
Of course we would like our distribution of secondary sources to be valid for an arbitrarily shaped boundary S. This preference will be easier to satisfy if the distribution of secondary sources, by itself, produces a zero wave function outside V —in other words, no backward secondary waves —because in that case, even if S is concave outward, the wave function inside V will not be complicated by "backward" waves generated at one point on S and entering V through another point on S. Accordingly, we would like our surface integral to be equal to the volume integral over V of ψ(r′, t) δ(r − r′), because that volume integral will be ψ(r′, t) if r′ is inside V, but zero if it is outside. In the integrand, the first factor can be replaced by [ψ] because the delta function selects the position r′, where the propagation distance s is zero. Moreover, by (33L), that delta function is the Laplacian of −1/4πs. So the desired integrand, multiplied by 4π , becomes
-
−[ψ] △ 1/ s .
(
)
Relating the volume integral of (128) to the surface integral of something like (126) would seem to require a surface-to-volume integral identity involving two different fields. Three such identities, all named for George Green, come to mind.
Green's identities
[edit | edit source]If and are scalar fields, then by identity (71d),
Integrating both sides over a volume V enclosed by a surface S , and applying the divergence theorem on the left, we get
where n̂ is the unit normal to S pointing out of V. This integral equation is called Green's first identity. Switching the roles of and yields a second integral equation, which can be subtracted from the first to obtain
this is Green's second identity. If n is the normal distance from S (positive outside V, negative inside), then, by relation (9g) between the gradient and the directional derivative, we can rewrite Green's second identity in the alternative form
-
(
)
which remains meaningful if one of the two operands is a generic field. And indeed, by the linearity of the various operators, the identity remains valid in that case (which is not always pointed out).
If, taking the above "hints", we put and in (129), we get
which is a particular case of what is sometimes called Green's third identity (although that label has also been attached to other things). Applying the chain rule to the second term in the surface integrand, and recalling the purpose of (128), we can rewrite this result as
-
(
)
Kirchhoff's integral theorem
[edit | edit source]In (130), the first term in the surface integrand is dominant for sufficiently large s and, as hoped, this dominant term is proportional to (126), because ∂n [ψ] is proportional to [ψ]. The second volume integral, as hoped, is ψ(r′, t) if r′ is inside V, but zero if it is outside. If we can somehow convert the first volume integral to a surface integral, a trivial rearrangement will express the second volume integral as a surface integral, as hoped. And from the terms that we know so far, the surface integral looks promising.
In that first term, however, the factor ∂n [ψ] is not to be confused with [∂n ψ]; the latter holds s constant while n varies, whereas the former also accounts for the variation of s with n (and consequently r) at a fixed r′. From (127), the contribution to ∂n [ψ] through the variation of s is
i.e.
Putting the pieces together, we get
-
(
)
Similarly, if ∂i denotes ∂/∂xi in Cartesian coordinates,
-
(
)
The last result can be applied three times to the volume integrand in (130); its applications are shown in red in the following sequence, which uses Cartesian coordinates with implicit summation:
i.e.,
-
(
)
To proceed further, we need to assign a direction to s , which we have defined as the propagation distance from r to r′. Being a distance, s is positive whether we measure it forwards or backwards w.r.t. the direction of propagation. But because r′, for present purposes, is fixed while r varies, we shall measure s "backwards", i.e. from r′ to r, with s = s ŝ̵ as the displacement vector from r′ to r (so that we can think of both r and s as coordinates of position r, and think of s as a scalar field). Under that convention,
-
∇s = ŝ = s⧸s = r − r′/s ,
(
)
so that |∇s| = 1. And because s , like r , is the distance from an origin, can be evaluated from (49):
With these substitutions, (133) becomes
-
(
)
If is a scalar field, we can put this in a more convenient form. By identity (71d),
i.e., by (134),
i.e., if we apply identity (73g) in the first term on the right, and use components in the second,
(with summation over i), where x′i is the i th component of r′. But div ∇ ln s is which can be evaluated from (49) as
Applying this and (132) to the previous result gives
i.e., since ∂i s is the i th component of ∇s and therefore of s⧸s ,
i.e., since ∂i s ∂i s = |∇s|2 = 1 ,
Multiplying this equation by 2s⧸c and adding it to equation (135) gives
As we are proposing to integrate over a volume containing no sources, we can invoke the (homogeneous) wave equation (45), so that the right-hand side is zero, leaving
-
(
)
Now at last we can substitute (136) and (131) into (130) and get
in which we can rewrite the first volume integral as
and then transpose it to the left side, obtaining
-
(
)
where the last term in the integrand on the left has been reverted to its original form (reversing the chain rule), and the remaining integral on the right is ψ(r′, t) if position r′ is inside V, but zero if it is outside.
Although we derived (137) by supposing, just after (135), that ψ is scalar, that restriction can now be dropped on account of the linearity of the various operators in (137).
Although we derived (137) by supposing that V is a finite region, we can extend the result to an infinite region by adding another sheet to the bounding surface S in such a way that (i) the region becomes finite, but (ii) the additional sheet makes no contribution to the surface integral. The simplest way to do this is to suppose that the additional sheet is at such a large distance that the disturbance has not reached it yet! By this expedient we can apply (137) not only to the region inside a closed surface, but also (e.g.) to the region outside a closed surface, or the region on one side of an infinite open surface.
Although we derived (137) by supposing, as usual in this paper, that n̂ points out of V and that n is measured out of V, this has the arguably counterintuitive implication that n̂ is typically against the direction of propagation—directly against it in the simplest case, in which V is the exterior of a sphere with a monopole source at its center.
So, in the following formal statement of our result, let us drop the symbol V and define n as being measured out of the region containing the sources, and consequently into the region that satisfies the homogeneous wave equation, changing the signs in (137).
Kirchhoff's integral theorem: If
- the wave function ψ satisfies the wave equation (with speed c) in a region R bounded by a surface S (with all sources consequently on the other side of S ), and
- s is the distance of the general point at position r from the observation point at position r′, and
- quantities in square brackets are to be delayed by s⧸c , and
- n is the normal coordinate measured from the general point on S into R [contrary to the usual direction for a named region, and contrary to the convention we have used above!],
then the expression
-
(
)
is equal to the wave function at r′ if r′ is inside R , but zero if it is outside.[47]
The above derivation (guided by Baker & Copson, 1939, pp. 38–40) is unusual in that it does not assume sinusoidal time-dependence at any stage—but objectionable in that (i) the quantity whose divergence we seek, after equation (135), is a wild guess, and (ii) there are places where a term with coefficient ±1 found by one path is merged with a term with coefficient ∓2 found by another path, as if the argument contained hidden redundancies that ought to be eliminated. A more traditional derivation (e.g., Baker & Copson, 1939, pp. 36–7; Born & Wolf, 2002, pp. 420–21) would first derive the special case for sinusoidal time-dependence (due to Helmholtz) from Green's identities, and then generalize the time-dependence. The traditional approach has the advantages of being less tortuous and more readily applicable to dispersive media (in which c is frequency-dependent), but the disadvantages of depending on complex numbers and on the premise that a general function of time can be expressed as a sum of sinusoids. Helmholtz's integrand is a sinusoidal version of our expression (139) below. One can derive that expression and thence the Kirchhoff integral in a far more elementary and accessible manner, albeit with some loss of rigor, by assuming (instead of justifying) the form of the wave function due to a monopole source (Putland, 2022). That approach first obtains the required distribution of secondary sources and then expresses the wave function as a surface integral—instead of inferring the secondary sources from the surface integral, as we are about to do here.
Monopole and dipole secondary sources
[edit | edit source]There is a convention whereby qi, n denotes the derivative w.r.t. n of the i th component of the vector field q , so that the corresponding derivative of the entire vector is written q, n , where the strange leading comma in the subscript announces that we want a partial derivative, not a component. Accordingly, let us write the derivative w.r.t. n of the generic field ψ as ψ, n . And let it be understood that any "comma n" derivative is to be evaluated locally, and not as observed at r′ ; that is, it does not account for the variation of s through n.
In that notation, the integrand inside the big parentheses in (138) can be written out in full as
i.e.
-
(
)
where ∂n does account for the variation of s through n. If h is a small change in n , from n = −h to n = 0 , the integrand can be written
-
(
)
The second term (including the minus sign) is recognizable as the contribution to the wave function from a monopole source with strength −ψ, n .[48] Similarly, in the first term, the expression in the big parentheses is the contribution from a monopole source with strength ψ⧸h ; and the operator h ∂n gives the change in that contribution due to n increasing from −h to 0 , i.e. the change in that contribution due to moving the said monopole from n = −h to n = 0 , i.e. the whole contribution due to the combination of a monopole with strength −ψ⧸h at n = −h and a monopole with strength ψ⧸h at n = 0. This combination is called a dipole (or doublet)[49] with strength ψ in the normal (n) direction.
According to (138), the expression (140) is to be scaled by 1/4π and integrated over the surface S. Thus the secondary source distribution can be described as a monopole distribution of strength density −ψ, n/4π plus a normal dipole distribution of strength density ψ/4π , where "strength density" means strength per unit area. This description is well known.[50]
The implication is not that the specified secondary sources really exist, or even that they could exist, but only that the wave function in the region R is as if it had been generated by the specified secondary sources (which would also give a null wave function outside the region). We should note, however, that a monopole contribution of the form (48) can really exist, even for a vector wave function, notwithstanding that it requires not only the magnitude but also the direction of the vector to be independent of the direction of propagation. That requirement might seem to exclude electromagnetic waves, for which the electric and magnetic fields are transverse to the direction of propagation and therefore not independent of it. But it is possible to describe such waves in terms of an electric scalar potential and a magnetic vector potential, such that the contribution to the latter from a current element has the same direction as the current element for all directions of propagation.[51]
Generalized spatiotemporal-dipole (GSTD) secondary sources
[edit | edit source]The "dipole" discussed so far is a spatial dipole, in which the constituent monopoles differ only in sign and by a small spatial displacement. If we add a small time shift between the monopoles, equal to the propagation time from one to the other, we get what David A.B. Miller (1991) calls a spatiotemporal dipole. If we allow the time shift to be smaller than that propagation time, and allow one monopole to be attenuated by a small fraction, we get what I call a generalized spatiotemporal dipole, the usefulness of which will now be demonstrated.
In the Helmholtz–Kirchhoff integrand (139), the second term (including the sign) represents a monopole strength density −ψ, n and the first term represents a dipole strength density ψ in the n direction; the dipole source per unit area of S is a spatial dipole comprising a monopole with strength −ψ⧸h at n = −h (the inverted monopole), and a monopole with strength ψ⧸h at n = 0 (the uninverted monopole), where h is small (and the indicated strength densities are eventually to be divided by 4π). The idea is that by suitably modifying the spatial dipole, we might eliminate the need for the separate monopole and thus effectively reduce a tripole to a dipole. Let us try delaying the strength function of the inverted monopole by τh , and reducing its magnitude by a fraction αh (e.g., no reduction if αh = 0), where τh and αh are small quantities to be determined. Then, compared with the uninverted monopole, the inverted monopole is recessed by the distance h , delayed by the time τh , and attenuated by the fraction αh. At the field point r′, according to (140), the wave function due to the uninverted monopole is
-
(
)
So the wave function due to the modified dipole is the total change in expression (141) due to n increasing by h , and t increasing by τh , and the magnitude increasing by αh times its final value. Since h and τh are small, that total change is
i.e.
which will agree with (139) if and only if, on S ,
-
(
)
This is the sufficient and necessary condition for the modified dipoles to give the same secondary waves as the original dipoles and monopoles. And it does not look helpful in the general case, due to the independence of the wave function and its partial derivatives w.r.t. r and t.
But now suppose that we have a single monopole primary source, so that the wave function ψ is given by (48). It is readily confirmed that this wave function and its partial derivatives are related by
-
(
)
Applying the chain rule to the right side of (142) gives
So, substituting from (143) and noting that ∂n r is cos(n, r) , i.e. the cosine of the angle between n̂ and r̂ , we have
-
(
)
To satisfy this for all ψ̇ and ψ , we equate the coefficients of ψ̇ , obtaining
-
(
)
and equate the coefficients of ψ , obtaining
-
(
)
so that the parameters of our "modified" dipole are uniquely determined. This "modified" dipole is what I have called a "generalized spatiotemporal dipole" (GSTD). The integrand in (139) may then be understood as a distribution of GSTDs on S , oriented normal to S , the first term in the braces representing the spatial aspect (equal and opposite monopoles) and the second term (in ψ, n) representing the modifications (delay and attenuation of the inverted monopole). According to (145), the delay of the inverted monopole is such that the waves from the two monopoles are synchronized (with opposing amplitudes) in the direction of the primary source, and in the cone of directions which make the same angle with the normal n̂ as the primary source; this cone includes the direction of specular reflection off S. And according to (146), the attenuation of the inverted monopole is such that the waves from the two monopoles cancel at a distance r in any of these directions (including at the primary source); at that distance, the greater proximity of the inverted monopole compensates for the reduced strength. Thus there are two ways in which the GSTDs suppress "backward" secondary waves: collectively, they produce a wave function which is null on the primary source's side of the surface S ; individually, they suppress secondary waves at particular distances in particular directions, including the direction of specular reflection off S.
If S coincides with a primary wavefront, so that its normal n̂ is parallel to r̂, then we have cos(n, r) = 0 in (145), so that the delay τh becomes h⧸c , which is simply the time taken for the waves emitted by the uninverted monopole to reach the inverted monopole. The latter is in the −n̂ direction, which is therefore the direction in which the waves from the two monopoles are synchronized (and cancel at distance r); the "cone of directions" collapses to its axis.
If the primary wavefront is plane, we have r → ∞ in (146), so that αh = 0 ; the inverted monopole is not attenuated, and the cancellation of the waves from the two monopoles (in the cone with its axis in the −n̂ direction) becomes a far-field effect.
So if S coincides with a primary wavefront and is plane, the inverted monopole is delayed by h⧸c and unattenuated, so that the waves from the two monopoles cancel in the −n̂ direction in the far field. This case is what Miller (1991) calls a "spatiotemporal dipole". We have "generalized" it in two ways: by allowing the delay of the inverted monopole to be less than h⧸c , so that the direction of cancellation need not be normal to S ; and by allowing the inverted monopole to be attenuated, so that the cancellation may occur at a finite distance. Together, these modifications allow the surface of integration S to be of a general shape and orientation and at a general distance from the primary source.
Although Miller applied his spatiotemporal-dipole theory to "uniform spherical or plane wave fronts" (Miller, 1991, after eq. 5), it is in fact a plain-wave (far-field) approximation in that it neglects the 1/r decay in the magnitude of the primary wave, with the result that his equation (4), which corresponds to our (143), lacks the second term on the right.[52] To make the theory exact for all r, we need the inverted monopoles to be attenuated in accordance with (146).
Application to diffraction by an aperture
[edit | edit source]Suppose that the primary sources are partly obstructed by an opaque baffle with an aperture in it, and that we are interested in the wave function that propagates beyond the baffle. Let us choose a surface S consisting of two segments, namely Sa spanning the aperture, and Sb on the side of the baffle facing away from the sources (the dark side or quiet side of the baffle). The obvious way to proceed is to suppose that the baffle simply eliminates the secondary sources on Sb while leaving the secondary sources on Sa unchanged (as if the baffle were not there). The result, as far as the wave function in R (beyond the baffle) is concerned, is simply that the integral in (140) is taken over Sa only.
Integrating over the aperture alone is indeed the standard answer, but there are various other ways of explaining it. Some explanations, including the famously inconsistent one offered by Kirchhoff himself, are discussed in Putland, 2022 (§ 2.2 and Appendices A & B), with references to other works.
Additional information
[edit | edit source]Acknowledgments
[edit | edit source]Professor Chen-To Tai, FIEEE, died in 2004. He first came to my attention in 2018 through his paper "On the presentation of Maxwell's theory" (Proc. IEEE, 60(8): 936–45, 1972). In nearly every place where I mention him here, even if I do not accept his conclusion, I am entirely indebted to his works for drawing my attention to the issue raised. In particular, it was he who alerted me to Gibbs's original definitions of the divergence and curl and their suitability for expression in indicial notation (Tai, 1995, pp. 17, 21). And although he might not have been pleased, it was through him that I first knew with certainty that the del-dot and del-cross notations work in general coordinates (ibid., pp. 64–5).
This article uses images from Wikimedia Commons.
The essence of the above derivation of "generalized" spatiotemporal-dipole secondary sources has been previously published (Putland, 2022, § 3.7), but not previously peer-reviewed.
Competing interests
[edit | edit source]None.
Ethics statement
[edit | edit source]This article does not concern research on human or animal subjects.
TO DO:
[edit | edit source]- More illustrations?
Notes
[edit | edit source]- ↑ Even if we claim that "particles" of matter are wave functions and therefore continuous, this still implies that matter is lumpy in a manner not normally contemplated by continuum mechanics.
- ↑ If r is the position of a particle and p is its momentum, the last term vanishes. If the force is toward the origin, the previous term also vanishes, and we are left with conservation of angular momentum about the origin.
- ↑ Here we use the broad triangle symbol (△) rather than the narrower Greek Delta (Δ); the latter would more likely be misinterpreted as "change in…"
- ↑ There is no need for parentheses around ρv , because div ρv cannot mean (div ρ)v , because the divergence of a scalar field is not defined.
- ↑ The material derivative d/dt is also called the substantive derivative, and is sometimes written D/Dt if the result is meant to be understood as a field rather than simply a function of time (Kemmer, 1977, pp. 184–5).
- ↑ Or nabla, because it allegedly looks like the ancient Phoenician harp that the Greeks called by that name.
- ↑ Stress is a second-order tensor, and the origin of the term "tensor"; but, for present purposes, it's just another possible example of a field called ψ.
- ↑ In mathematical jargon, it should be a two-dimensional manifold embedded in 3D Euclidean space.
- ↑ If any part of our argument requires Σ or C to be smooth, this is not an impediment, because having approximated Σ or C to any desired accuracy by a polyhedron or polygon, we can then approximate the polyhedron or polygon to any desired higher accuracy by a smooth surface or curve!
- ↑ In the general case, there is an extra term ∂ D/∂t on the right; but this term is zero in the magnetostatic case.
- ↑ When a gas is compressed, work is done on it, causing its temperature to rise, so that the ratio of dp to dρ is higher than if the compression were isothermal. In sound waves, there is typically not enough time for a significant part of the heat of compression to be conducted away; that is, the compression is near enough to adiabatic. The words "not enough time" may suggest that the adiabatic approximation is a high-frequency approximation. But in fact, in free air, it is a low-frequency approximation, because as the frequency is reduced, the equalization of temperature is hindered more by the longer wavelength than it is helped by the longer period. Only in a confined space, which limits the required distance of conduction, does the adiabatic assumption require the frequency to be above some lower limit. In a musical wind instrument, that lower limit tends to be far below the audible range. Meanwhile the upper limit, due to easier heat conduction within a shorter wavelength, tends to be very far above the audible range. Thus, under typical conditions, for the purpose of calculating c , the adiabatic assumption is reasonable. (See Fletcher, 1974.)
- ↑ Or sometimes "quabla", by analogy with "nabla".
- ↑ In particular, some authorities change the sign, defining ☐ as 1/ c² ∂²/∂t² − △ , and some write the operator (however defined) as ☐2.
- ↑ The symbol c comes from a general-purpose Latin word for speed, but has become the usual symbol for wave speed.
- ↑ Tai (1995, pp. 43–4) also disagrees with Moon & Spencer, but for a different reason: he regards the Laplacian as the divergence of the gradient even if the operand is a vector field. For better or worse, we do not consider the gradient of a vector in the present paper—although the reader can probably work out how to modify (26g) if dr is written as a column vector and dp is replaced by a column vector (compare the later footnote on dyadics).
- ↑ A quaternion is a mathematical object invented by William Rowan Hamilton in 1843, consisting of two parts which Hamilton later called the scalar part and the vector part. For most purposes the two parts were found to be more useful separately than together. By putting them together, however, Hamilton constructed a set which satisfied all the algebraic field axioms except commutativity of multiplication. This was, and is, considered a triumph.
- ↑ They involve dyadics, i.e. 2nd-order tensors written in a vector-friendly notation. The fourth of the seven is
- ∇(τ⸱ ω) = ∇τ ⸱ ω + ∇ω ⸱ τ ,
- (ω ⸱∇)τ + (τ⸱∇)ω ,
- ↑ I don't overlook the fact that Tai's symbolic vector, unlike the del operator, is subject to commutative and anticommutative laws. Neither do I see how it helps.
- ↑ Indicial notation is standard in higher-order tensor analysis, which however tends not to use unit vectors of coordinate systems, and therefore tends not to encourage the indexing of unit vectors in elementary vector analysis—whereas in the present paper, I have unapologetically indexed the unit vectors.
- ↑ Hence we want each ui(r) to be, as far as possible, a smooth function. This may require some tweaking of definitions. E.g., in cylindrical coordinates, the angular coordinate φ must be confined to some 360° range in order to make it unique, and we don't want it jumping from the end of the range to the beginning within the region of interest.
- ↑ But not dot-del for the Laplacian, as in (19), because we want to use an elementary product rule inside the integral.
- ↑ Hsu (1984, p. 171) implies that the scale factors are also called "metric coefficients", and Tai (1994, 1995) prefers the latter term. This is loose terminology because, in general, the metric coefficient is defined as
- gij = hi ⸱ hj .
- ↑ Outline of proof: If the reciprocal vectors were unit vectors, then the angles between the original unit vectors would need to be equal, in order that their cross products have the same magnitude as their scalar triple product (Jacobian); and the latter condition requires the common angle to be 90°.
- ↑ In (124), if ψ is a vector field, we also need the derivatives of its basis vectors w.r.t. ui in terms of ui .
- ↑ Notice that the desired secondary sources are not segments of the moving wavefronts, but segments of a stationary surface influenced by the passing waves. Compare Huygens' original statement: "that each particle of matter in which a wave spreads, ought not to communicate its motion only to the next particle which is in the straight line drawn from the luminous point, but that it also imparts some of it necessarily to all the others which touch it and which oppose themselves to its movement. So it arises that around each particle there is made a wave of which that particle is the centre" (Huygens, 1690, tr. Thompson, p. 19; my emphasis). Huygens chooses secondary sources on the same primary wavefront at the same time for the purpose of constructing the "continuation" of the wavefront (the same wavefront at a later time) in the same medium (ibid., pp. 19, 50–51), but not for the purpose of constructing a wavefront reflected or refracted at an interface between two media; for the latter purpose, he chooses secondary sources at various points on the reflecting or refracting surface, although the primary wavefront reaches those points at various times (ibid., pp. 23–4, 35–7, etc.).
References
[edit | edit source]- ↑ Axler, 1995, §9. The relegation of determinants was anticipated by C.G. Broyden (1975). But Broyden's approach is less radical: he does not deal with abstract vector spaces or abstract linear transformations, and his eventual definition of the determinant, unlike Axler's, is traditional—not a product of the preceding narrative.
- ↑ Axler, 1995, §1. But it is Broyden (1975), not Axler, who discusses numerical methods at length.
- ↑ E.g., Feynman (1963, vol. 1, § 11-5), having defined velocity from displacement in Cartesian coordinates, shows that velocity is a vector by showing that its coordinate representation contra-rotates (like that of displacement) if the coordinate system rotates.
- ↑ E.g., Feynman (1963, vol. 1, § 11-7), having defined the magnitude and dot-product in Cartesian coordinates, proves that they are scalar functions by showing that the corresponding expressions in rotated ("primed") coordinates give the same values as the original expressions (in "unprimed" coordinates). And Tai (1995, pp. 66–7), having found an expression for the "gradient" operator in a general coordinate system (the "unprimed" system), proves the "invariance" of the operator (its vector character in this case) by showing that the corresponding expression in any other general coordinate system (the "primed" system) has the same effect.
- ↑ There are many proofs and interpretations of this identity. My own effort, for what it's worth, is "Trigonometric proof of vector triple product expansion", Mathematics Stack Exchange, t.co/NM2v4DJJGo, 2024. The classic is Gibbs, 1881, §§ 26–7.
- ↑ Gibbs, 1881, § 56.
- ↑ Katz, 1979, pp. 146–9.
- ↑ In Feynman, 1963, −∇p as the "pressure force per unit volume" eventually appears in the 3rd-last lecture of Volume 2 (§40-1).
- ↑ A demonstration like the foregoing is outlined by Gibbs (1881, § 55).
- ↑ Wilson, 1901, pp. 147–8; Borisenko & Tarapov, 1968, pp. 147–8 (again); Hsu, 1984, p. 92; Kreyszig, 1988, pp. 485–6; Wrede & Spiegel, 2010, p. 198.
- ↑ Gibbs (1881, § 50) introduces the gradient with this definition, except that he calls ∇u simply the derivative of u, and u the primitive of ∇u. Use of the term gradient as an alternative to derivative is reported by Wilson (1901, p. 138).
- ↑ Cf. Borisenko & Tarapov, 1968, p. 157, eq. (4.43), quoted in Tai, 1995, p. 33, eq. (4.19).
- ↑ The first two cases may be compared with Javid & Brown, 1963, cited in Tai, 1994, p. 15.
- ↑ The first two cases may be compared with Neff, 1991, cited in Tai, 1994, p. 16.
- ↑ But Gibbs (1881) and Wilson (1901) were content to leave it as ∇⸱∇. And they did not call it the Laplacian; they used that term with a different meaning, which has apparently fallen out of fashion.
- ↑ Tai & Fang, 1991, pp. 168–9.
- ↑ Durney & Johnson, in Introduction to Modern Electromagnetics (1969, p. 45, cited in Tai, 1994, p. 12), make the absurd statement that "a ∇ operator cannot be defined in the other coordinate systems…" In the context, they apparently meant to say that div A isn't ∇⸱A in other coordinate systems. Robert S. Elliott, in Electromagnetics (1966, p. 606, cited in Tai, 1994, p. 13), says that "only in Cartesian coordinates… do the gradient and divergence operators turn out to be identical." Apparently he meant to say that only in Cartesian coordinates do the two operators differ by a dot. But what these authors apparently meant to say is still wrong, as shown with counterexamples by Kemmer (next reference).
- ↑ The perception that they are restricted to Cartesian coordinates arises partly from failure to allow for the variability of the basis vectors in curvilinear coordinate systems; cf. Kemmer, 1977, pp. 163–5, 172–3 (Exs. 2, 3, 5), 230–33 (sol'ns). From the del operator and the derivatives of the basis vectors w.r.t. the coordinates, Kemmer finds the curl and divergence in cylindrical coordinates, notes that we can do the same "with a little greater effort" in spherical coordinates (p. 230), and finds the Laplacian of a scalar in both coordinate systems (p. 231). He further reports that the method works for the Laplacian of a vector in cylindrical and spherical coordinates and is relatively convenient for the former (p. 232), for which "differentiation of the unit vectors is very simple" (p. 165).
- ↑ Kemmer (1977, p. 98, eq. 4) gives an equivalent result for our first three integral theorems (5g to 5d) only, and calls it the generalized divergence theorem because the divergence theorem is its most familiar special case.
- ↑ E.g., Gibbs, 1884, § 165, eq. (1); Wilson, 1901, p. 255, Ex. 1; Kemmer, 1977, p. 99, eq. (6); Hsu, 1984, p. 146, eq. (7.31).
- ↑ Cf. Katz, 1979, pp. 149–50.
- ↑ Although Hsu (1984, p. 141) applies that name to our theorem (5c).
- ↑ E.g., Gibbs, 1881, § 61; Hsu, 1984, pp. 117–18.
- ↑ Cf. Feynman, 1963, vol. 2, §2-8.
- ↑ Although Hsu (1984, p. 141) applies that name to our theorem (5g).
- ↑ Cf. Gibbs, 1881, §§ 50, 59; presumably this is one reason why Gibbs called the gradient simply the derivative.
- ↑ Cf. Gibbs, 1881, §§ 50, 51; presumably this is another reason why Gibbs called the gradient the derivative.
- ↑ Our definition of strength follows the old convention used by Baker & Copson (1939, p. 42), Born & Wolf (2002, p. 421), and Larmor (1904, p. 5). The newer convention followed by Miller (1991, p. 1371) would use the denominator 4πr instead of our r in (48); this would have the advantage of eliminating the factor 4π from the D'Alembertian of the wave function, and the disadvantage of introducing that factor into the (denominator of the) wave function itself.
- ↑ The latter passage, as it appears in the 5th edition (p. 397), is the one cited by Tai (1994, p. 6).
- ↑ Quoted by Tai (1994), in alphabetical order within each category. For Kovach he could have added p. 308. Potter he misnames as Porter.
- ↑ Quoted by Tai (1994, p. 23).
- ↑ Wilson, 1901, p. 150.
- ↑ Wilson, 1901, pp. 150, 152. Wilson does not announce this idea in his preface (p. xii), although Tai (1995, p. 26) gets the contrary impression by omitting a comma from the relevant quote.
- ↑ Tai, 1995, pp. 26, 38.
- ↑ Tai, 1995, p. 28.
- ↑ The latter observation is made, or at least suggested, by Kemin et al. (2000, p. 605).
- ↑ The following explanation takes some hints from Christopher Ford's note on "Vector Potentials" at maths.tcd.ie/~houghton/231/Notes/ChrisFord/vp.pdf, circa 2004.
- ↑ Cf. Gibbs, 1881, § 71, and Moon & Spencer, 1965, p. 235; quoted in Tai, 1995, pp. 18, 43.
- ↑ Moon & Spencer, 1965, p. 236.
- ↑ Tai, 1995, p. 35.
- ↑ Tai, 1995, pp. 25, 29.
- ↑ Wilson, 1901, pp. ix, xi–xii.
- ↑ Gibbs, 1881–84, privately printed version—of which the scan linked in our bibliography is of the very copy that Gibbs sent to Heaviside, with annotations in Heaviside's hand. On the annotations see Rocci, 2020.
- ↑ In the next equation as printed in Borisenko & Tarapov (1968, p. 180), the first cross should be "="; Tai (1995, p. 46) corrects it.
- ↑ Tai, 1995, p. 66, eq. (9.41).
- ↑ The following derivation is guided by Baker & Copson (1939, pp. 38–40) but, I hope, will be found more heuristic than its model.
- ↑ Born & Wolf, 2002, pp. 420–21, eq. (13). Cf. Baker & Copson (1939, p. 37) and Miller (1991, eq. 2), who use r instead of s (among other notational differences). Baker & Copson, in their last equation on p. 40, give the opposite sign because on this occasion they measure the normal coordinate out of the region.
- ↑ Reminder : There are rival definitions of the "strength" of a monopole source; see the text and footnote under equation (54) above.
- ↑ The term doublet, which seems to be older, is used by Baker & Copson (1939), Born & Wolf (2002, p. 421), and Larmor (1904).
- ↑ E.g., Born & Wolf, 2002, p. 421.
- ↑ Cf. Feynman, 1963, vol. 2, chap. 15, Table 15-1, "A(1, t ) = …".
- ↑ Larmor (1921) had likewise neglected the 1/r decay: the equation second from the bottom on his p. 172 agrees with Miller's eq. (4).
Bibliography
[edit | edit source]- S.J. Axler, 1995, "Down with Determinants!" American Mathematical Monthly, vol. 102, no. 2 (Feb. 1995), pp. 139–54; jstor.org/stable/2975348. (Author's preprint, with different pagination: researchgate.net/publication/265273063_Down_with_Determinants.)
- S.J. Axler, 2023–, Linear Algebra Done Right, 4th Ed., Springer; linear.axler.net (open access).
- B.B. Baker and E.T. Copson, 1939, The Mathematical Theory of Huygens' Principle, Oxford; 3rd Ed. (same pagination, with addenda), New York: Chelsea, 1987, archive.org/details/mathematicaltheo0000bake.
- A.I. Borisenko and I.E. Tarapov (tr. & ed. R.A. Silverman), 1968, Vector and Tensor Analysis with Applications, Prentice-Hall; reprinted New York: Dover, 1979, archive.org/details/vectortensoranal0000bori.
- M. Born and E. Wolf, 2002, Principles of Optics, 7th Ed., Cambridge, 1999 (reprinted with corrections, 2002).
- C.G. Broyden, 1975, Basic Matrices, London: Macmillan.
- R.P. Feynman, R.B. Leighton, & M. Sands, 1963 etc., The Feynman Lectures on Physics, California Institute of Technology; feynmanlectures.caltech.edu.
- N.H. Fletcher, 1974, "Adiabatic assumption for wave propagation", American Journal of Physics, vol. 42, no. 6 (June 1974), pp. 487–9; doi.org/10.1119/1.1987757.
- J.W. Gibbs, 1881–84, "Elements of Vector Analysis", privately printed New Haven: Tuttle, Morehouse & Taylor, 1881 (§§ 1–101), 1884 (§§ 102–189, etc.), archive.org/details/elementsvectora00gibb; published in The Scientific Papers of J. Willard Gibbs (ed. H.A. Bumstead & R.G. Van Name), New York: Longmans, Green, & Co., 1906, vol. 2, archive.org/details/scientificpapers02gibbuoft, pp. 17–90.
- H.P. Hsu, 1984, Applied Vector Analysis, Harcourt Brace Jovanovich; archive.org/details/appliedvectorana00hsuh.
- C. Huygens, 1690, tr. S.P. Thompson, Treatise on Light, University of Chicago Press, 1912 / gutenberg.org/files/14725/14725-h/14725-h.htm, 2005. (See also "Errata in various editions of Huygens' Treatise on Light ", www.grputland.com or grputland.blogspot.com, June 2016.)
- V.J. Katz, 1979, "The history of Stokes' theorem", Mathematics Magazine, vol. 52, no. 3 (May 1979), pp. 146–56; jstor.org/stable/2690275.
- S. Kemin, X. Zhenting, T. Jinsheng, & H. Xuemei, 2000, "The comprehension, some problems and suggestions to symbolic vector method and some defenses for Gibbs' symbol", Applied Mathematics and Mechanics (English Ed.), vol. 21, no. 5 (May 2000), pp. 603–6; doi.org/10.1007/BF02459044.
- N. Kemmer, 1977, Vector Analysis: A physicist's guide to the mathematics of fields in three dimensions, Cambridge; archive.org/details/isbn_0521211581.
- E. Kreyszig, 1962 etc., Advanced Engineering Mathematics, New York: Wiley; 5th Ed., 1983; 6th Ed., 1988; 9th Ed., 2006; 10th Ed., 2011.
- J. Larmor, 1904, "On the mathematical expression of the principle of Huygens" (read 8 Jan. 1903), Proceedings of the London Mathematical Society, Ser. 2, vol. 1 (1904), pp. 1–13.
- D.A.B. Miller, 1991, "Huygens's wave propagation principle corrected", Optics Letters, vol. 16, no. 18 (15 Sep. 1991), pp. 1370–72; stanford.edu/~dabm/146.pdf.
- P.H. Moon and D.E. Spencer, 1965, Vectors, Princeton, NJ: Van Nostrand.
- W.K.H. Panofsky and M. Phillips, 1962, Classical Electricity and Magnetism, 2nd Ed., Addison-Wesley; reprinted Mineola, NY: Dover, 2005.
- G.R. Putland, 2022, "Consistent derivation of Kirchhoff's integral theorem and diffraction formula and the Maggi-Rubinowicz transformation using high-school math" (working paper), doi.org/10.5281/zenodo.7205781 (Creative Commons).
- A. Rocci, 2020, "Back to the roots of vector and tensor calculus: Heaviside versus Gibbs" (online 10 Nov. 2020), Archive for History of Exact Sciences, vol. 75, no. 4 (July 2021), pp. 369–413. (Author's preprint, with different pagination: arxiv.org/abs/2010.09679.)
- C.-T. Tai, 1994, "A survey of the improper use of ∇ in vector analysis" (Technical Report RL 909), Dept. of Electrical Engineering & Computer Science, University of Michigan; hdl.handle.net/2027.42/7869.
- C.-T. Tai, 1995, "A historical study of vector analysis" (Technical Report RL 915), Dept. of Electrical Engineering & Computer Science, University of Michigan; hdl.handle.net/2027.42/7868.
- C.-T. Tai and N. Fang, 1991, "A systematic treatment of vector analysis", IEEE Transactions on Education, vol. 34, no. 2 (May 1991), pp. 167–74; doi.org/10.1109/13.81596.
- E.B. Wilson, 1901, Vector Analysis: A text-book for the use of students of mathematics and physics ("Founded upon the lectures of J. Willard Gibbs…"), New York: Charles Scribner's Sons; 12th printing, Yale University Press, 1958, archive.org/details/vectoranalysiste0000gibb.
- R.C. Wrede and M.R. Spiegel, 2010, Advanced Calculus, 3rd Ed., New York: McGraw-Hill (Schaum's Outlines); archive.org/details/schaumsoutlinesa0000wred.
Further reading
[edit | edit source]M.J. Crowe, "A History of Vector Analysis" (address at the University of Louisville, Autumn term, 2002), researchgate.net/publication/244957729_A_History_of_Vector_Analysis (including much discussion of quaternions).
P. Lynch, "Matthew O'Brien: An inventor of vector analysis", Bulletin of the Irish Mathematical Society, No. 74 (Winter 2014), pp. 81–8; doi.org/10.33232/BIMS.0074.81.88.