Jump to content

One man's look at the design of dictionary definitions

From Wikiversity

This article by Dan Polansky contains considerations concerning the design of dictionary definitions. Dan Polansky studied mathematics and mathematical fields, noted for their extensive use of definitions. Moreover, he is an experienced translation lexicographer, as for entering mappings between Czech and English into the English Wiktionary. However, this activity does not cover definition design. As a result, this article is written by someone with almost no experience in dictionary definition design.

Dictionary definitions vs. mathematical definitions

[edit | edit source]

Let us start by demarcating the subject from related fields. Above all, dictionary definitions are very different from mathematical definitions. In general, mathematical definitions are fairly often much longer than the short dictionary format allows. But not always; some mathematical definitions fit the dictionary format well, e.g. a definition of a prime number. Metric space is an example of a mathematical concept that fits the dictionary format relatively poorly. Moreover, dictionary defintions are descriptive: they describe facts of usage. By contrast, mathematical definitions are a mixture of stipulative and descriptive. An author of a mathematical article can stipulate a definition of a term any way he wants. When that terminological choice becomes conventional, the definition becomes not only the author's stipulation but rather part of common usage and thereby descriptive of common usage. There are also prescriptive definitions, but these would require a separate investigation, and clarification whether they are the same thing as stipulative definitions. Coming back to metric space, M-W[1]defines it as "a mathematical set for which a metric is defined for any pair of elements", complementes by M-W:metric[2] That seems to actually do the job (needs a double check)? One can compare the definition to that given by Britannica.[3]

Genus proximus vs. defining vocabulary

[edit | edit source]

Let us continue with a consideration of two different concepts: one is relying nearly exclusively on defining vocabulary, another on the Aristotelian genus-differentia definitions (genus proximus, differentia specifica). In the Aristotelian concept, one should identify the genus proximus. Such a genus should be proximal or nearby; it should not be unnecessarily distant. For example, the domestic cat can be defined as a certain kind of domestic feline animal. Here, once relates to scientific taxonomy to figure out the genus proximus (genus in the linguistic sense, not taxonomic sense; the "genus" can be e.g. a family, in the biological taxonomical terminology). By contrast, one can decide to base all definitions of the defining vocabulary and define cat in a way that does not depend on feline. There, one would not worry about the genus being proximus but rather the genus being part of the (relatively small) defining vocabulary. But the plan to base all definitions exclusively on defining vocabulary is extremely impractical especially for trivial derivatives such as adverbs derived by affixing -ly, -ness words, agent nouns, etc. Since, to define, say, a -ly adverb, one would not in general be able to use the base adjective in the definition. A solution chosen by Merriam-Webster online is not to define trivial derivatives at all and merely list them next to the base word in the base word entry. But this is probably only one complication of using only defining vocabulary; the plan, while interesting, may prove impractical for other reasons.

Trivial derivatives and use of macros

[edit | edit source]

We touched on the subject of trivial derivatives. For them, what I call a macro is relevant, a definition that is not really a definition proper but rather something like a macro whose expansion provides the definition proper. Let us take the word blueness, defined as the quality or state of being blue. But blue in which sense? In any sense. As soon as one tries to enters a mapping to different languages (as one does in a multilingual dictionary), one sees how this breaks down: different senses of blue generally map to different translations, and therefore, there are also different mappings of blueness to words in other languages. One seems forced to split a -ness word into as many senses as the base adjective has. Merriam-Webster is a monolingual dictionary and does not have this problem; it merely lists -ness words next to the base adjectives, e.g. for blue[4] By doing so, Merriam-Webster avoids a massive duplication from blue to blueness. In a multilingual dictionary, this duplication creates a considerable maintenance overhead: any change of definition in the base word should theoretically be reflected in the definition of the trivial derivative. A solution is to start with macros and only replace them with definitions proper as need arises, possibly even allowing a mixed state of affairs.

Minimum definition vs. encyclopedic extension: chemical elements

[edit | edit source]

The definitions of chemical elements present an interesting question from the logical point of view and encyclopedicity: to what extent should definitions be extra-logical (extra-minimal?) and encyclopedic? This case suggests that a certain degree of encyclopedicity is unavoidable unless we want the dictionary to be quite peculiar. Since, a chemical element is identified by its atomic number and once this is stated, the definition is complete. By contrast, Merriam-Webster:gold states the atomic number but also provides some key characteristics.[5] A similar concern pertains to names of chemical compounds: one perhaps wants to not only identify them/define them but also characterize them. From a logical point of view and history, godl is not defined by its atomic number; rather, gold was discovered to be a chemical elements with a certain atomic number. Even gold's being a chemical element may be a discovery rather than part of the historic definition reflecting the cognitive access to gold. One is reminded of Kripke's Meaning and Necessity; deeper deliberations could be picked from there.

Circularity of definitions

[edit | edit source]

There is a concern with circularity of definitions. Circularity arises not only in a single entry but between entries. Circularity cannot be completely removed (the dependency graph needs to have something like a source or a sink, depending on which direction of the edges one considers) but it can be restricted to definitions of the defining vocabulary. The defining vocabulary is a set of words such that all other words can be defined with them without any circularity. One can verify that this is satisfied by taking a candidate definition of a word outside of defining vocabulary and expanding its components until one gets a definition that uses exclusively words from the defining vocabulary.

Basing definitions on quotations of use

[edit | edit source]

Theoretically, one determines the definitions of words with the use of citations/quotations of use. How this works is not entirely clear. But one can get an idea. Let us consider a putative word X_NOUN, knowing it is a noun. And we find the following sentence in use: "X_NOUNs turned red, orange and yellow and the floor under the trees were full of them". We find an additional sentence: "connifers do not have X_NOUNs". We can already guess tha X_NOUNs are leaves. This presupposes we note certain classes in the world as cognitively significant, worth tracking as classes. We track the concept of a leaf (of a tree) but not leaf that is on the floor, is yellow and has been split by a human in half. Something like pre-linguistic cognitive ontology provides candidate classes, attributes, relationships, operations, etc., to the child. The child must be able to build something like an ontology (classes, attributes, relationships, etc.) even before it learns words, but maybe there is some iterative interplay, in which an initial very naive ontology gets refined by interaction with language. Be it as it may, this consideration suggests the problem of extracting semantic meaning from word usage is not entirely intractable (if it was, definition lexicography would be impossible?).

Section Definitions of lay names in biological taxonomy quotes a remarkably sophisticated definition of fish from M-W. It is hard to imagine this is a result of examination of uses of the word fish, let alone the English Wiktionary's required mere 3 uses.

Word with many meanings

[edit | edit source]

Some words have great many meanings, especially some English verbs. How anyone is able to figure out and sort out all these meanings is something of a mystery.

Duplication reduction via definition by synonym

[edit | edit source]

Another concern is with avoidance of repetition of definitions of synonyms. Here, polysemy (words genereally having multiple meaning) creates a hurdle. The modern version of Merriam-Webster online often defines senses by stating a single synonym and providing the sense number of the defining synonym. This may well be supported by a reasonably sophisticated software. It ensures excellent consistency and avoids maintenance overhead but can be rather challenging as for implementing changes. Since, once one changes the sense structure in a word used to define other words, in general, all the words depending on the defining word need to be updated. This may require something like a fairly complex change request, stating items affected, describing changes in them, etc.; or maybe the supporting software handles this in some way.

Definition by synonym, an ambiguity

[edit | edit source]

A definition by synonym is in general problematic because of polysemy. This can be handled as mentioned above, by indexing of senses. Some Czech dictionaries sometimes define words by listing a set of quasi-synonyms; these leave a lot to wish.

Approximate definitions vs. precise definitions

[edit | edit source]

Some definitions are probably merely approximate and not perfectly accurate. But not all: as mentioned, since a chemical element is uniquely identified by its atomic number, a definition stating the atomic number does a perfect job of picking its referent, at least in this world (as opposed to possible worlds). Moreover, e.g. defining a -ly adverb via its adjective often seems perfectly accurate; similarly for other trivial derivatives.

Definitions of natural kinds

[edit | edit source]

Definitions of natural kinds (e.g. biological species) probably present a special problem. One has to pick the uniquely selecting characteristics but what these are may be not entirely clear and one may veer into encyclopedicity (the result is not so much a definition as characterization). And it may be unclear whether they work well across possible worlds. This would also be a problem for chemical elements (these are natural kinds) when one would not be able to use the atomic number. But one could also ask: if one picks gold by its physical and chemical characteristics, could its atomic number be different in a possible world? On the other hand, the question is perhaps a bit silly since the physical and chemical characteristics of gold are determined by the particle composition of the nucleus. One may find relevant considerations in Stanford Encyclopedia of Philosophy.

Encyclopedicity of definitions

[edit | edit source]

One complaint sometimes raised against definitions in the English Wiktionary is that they are encyclopedic. As pointed out above in relation to chemical elements, this is better not avoided to some extent (defining via atomic number alone would work, but would be hardly reader friendly). The definition of "gold" in M-W veers on encyclopedic and is marvellous: "a yellow metallic element with atomic number 79 that occurs naturally in pure form and is used especially in coins, jewelry, and electronic". One learns: 1) the atomic number, which would be enough for unique selection/picking out; 2) some of the physical and chemical properties; 3) the social significance. Sure enough, one does not want to have a paragraph-long encyclopedic summary instead of a definition, but the problem in such a case would be excessive length, not the encyclopedic character. A more limited definition/characterization of gold would do the job well enough, but this wealth of characterization arguably makes the definition better, not worse.

Moreover, the very concept of encyclopedicity is dubious and vague. Since encyclopedia articles often start with a definition or a characterization, there is necessariy an overlap between information found in encyclopedia and that found in a dictionary. The question should not be whether a dictionary definition is encyclopedic but rather whether it contains too much extraneous information and is overlong. The word encyclopedic seems pushed to serve this different concern (too long or detailed a definition/characterization), as some kind of shortcut label, but it creates an analytical confusion.

Etymology is not semantics

[edit | edit source]

Etymology is not semantics. While it is true that something like etymological quasi-semantics is often readily available to interpret an utterance to the mind of the interpreter (if he knows the etymology, which is the case in the transparent morphology), one generally does not find this kind of quasi-semantics in uses, unless the etymology becomes lexicalized or happens to match the semantics. And there is no point in repeating etymology as a definition, or multiple definitions.

Trivial all-sense-encompassing quasi-semantics

[edit | edit source]

Trivial all-sense-encompassing quasi-semantics is pointless to enter into a dictionary. I mean something like this: A definition of cat: "Any thing labeled by the English word 'cat' in any of its senses". Yes, such a quasi-sense may sometimes be used (even if rarely) but it is entirely trivial and uninteresting. (However, such a construction can be cognitively interesting. For instance, one may learn from a German speaker that he saw a Grunzbetricht, and have no idea what that is. A report can be as follows: "Person so-and-so saw what they referred to using the German word Grunzbetricht", or in an abbreviated and non-explicit form, "Person so-and-so saw Grunzbetricht". Needless to say, I made the word Grunzbetricht up.)

Sense nesting

[edit | edit source]

Nesting of senses is used in some English dictionaries online, but not by Czech dictionaries. In the English Wiktionary, this produces some really bad outcomes, by my assessment, in part since it requires talents and attention to detail that few possess. Merriam-Webster online often merely groups senses rather than nesting them; that is, if there are senses 1a, 1b and 2, there is not definition for sense 1. But sometimes even Merriam-Webster creates an outright subsense structure.

Definitions with optional elements

[edit | edit source]

The role of "especially", "typically", "usually" etc. in definitions is unclear. From a logical point of view, it seems like nonsense: if a sign is not essential in the definition, it is not part of the selector. But it does make some sense. If we consider, say, bird, this is an entity that typically can fly, even stereotypically, but not all birds can fly. Merriam-Webster's bird entry says nothing about flying in the relevant sense: "any of a class (Aves) of warm-blooded vertebrates distinguished by having the body more or less completely covered with feathers and the forelimbs modified as wings". For birds, we may argue that the definition is not meant to select a bird individual (distinguish it from other biological individuals) but rather the class; and thus, if we consider the class of all birds, it is perhaps true that most individuals of that class can fly, or considered differently, most taxa within the class are flying taxa (switching from individuals to taxa can affect the effect of the word "most").

Definitions of proper names

[edit | edit source]

Definitions of proper names (proper nouns and phrases acting as them) present perhaps a special case. Some considerations are in my Proper name article. Questions that arise include: which characteristics should one pick as somehow defining an individual entity such as a city? Theoretically, stating the latitude and longitude of some time-invariant part of the city (come kind of city center) should do and the rest is encyclopedic. Do definitions of proper names tend to be encyclopedic characterizations or summaries that fail to work across possible worlds, especially in view of Kripke's Meaning and Necessity and its idea of proper names being rigid designators, standing in contrast to Russell's theory of descriptions? Should the definition be phrased as a definite description, starting with "the" rather than "a/an"? Can one use the word "certain" in the definition to avoid bearing the responsibility of providing a uniquely selecting criteria? Or should one take "certain" to be implied? Let us consider's M-W Darwin entry[6]. The leading sense says: "Charles Robert 1809–1882 English naturalist". The year range is uniquely identifying, but one also wants to know that he was an English naturalist. Interestingly, no mention is made of evolution by natural selection for which he is so famous, so here, encyclopedicity is avoided. Let us look at the second definition, "Erasmus 1731–1802 grandfather of Charles Darwin English physiologist and poet". Here, in contrast to the first definition, significance is indicated by the part "grandfather of Charles Darwin"; why is significance not indicated in the first definition? There is also a geographical name: "city and port in northern Australia on Port Darwin (an inlet of the Timor Sea); capital of the Northern Territory population 103,016", which does not start with an article and thereby avoids the "the" vs. "a/an" conundrum. The statement of the precise population seems rather encyclopedic and creating maintenance overhead, which could be reduced by stating "population over 100,000".

Definitions of lay names in biological taxonomy

[edit | edit source]

There is something called vernacular names. These seem to contrast to scientific names. Lay names are either the same thing as vernacular names or not. In any case, something like lay names seem often confusingly mapped to taxa, in part probably since lay people either do not care all that much about precise biological taxonomy or they lack the requisite know-how.

Fly agaric is an example of a name that is unproblematic, mapping directly to Amanita muscaria. M-W defines fly agaric as "a medium to large poisonous amanita mushroom (Amanita muscaria) with a usually bright red cap".[7]

A good example of a problematic or confusing name is missing, but based on experience from the English Wiktionary, they do exist. They would probably also be reflected in some way in Wikidata.

One slightly confusing name is "fly", which, per Britannica, 1) maps to "order Diptera", 2) is used "for almost any small flying insect".[8]. Wikidata:fly has the description "common name of small flying insects, especially Diptera", which is peculiar since Wikidata items normally stand for named entities rather than names. Other such cases can be found by searching for items linking to Wikidata:organisms known by a particular common name.

Fish is an example of a name that does not seem to refer to a taxon, believing Britannica, as per "It describes a life-form rather than a taxonomic group."[9] The main definition of fish given by M-W seems to be a marvel of definition engineering, suggestive of remarkable expertise: "any of numerous cold-blooded strictly aquatic craniate vertebrates that include the bony fishes and usually the cartilaginous and jawless fishes and that have typically an elongated somewhat spindle-shaped body terminating in a broad caudal (see caudal sense 2) fin, limbs in the form of fins when present at all, and a 2-chambered heart by which blood is sent through thoracic gills to be oxygenated"[10] By contrast, the English Wiktionary does with "A cold-blooded vertebrate animal that lives in water, moving with the help of fins and breathing with gills." It is unclear whether fish is really a lay name/vernacular name; it is in any case a name not mapping to a taxon.

Further reading:

Duplication between perfective and imperfective

[edit | edit source]

For Slavic languages such as Czech, there is an opportunity for definition duplication between perfective and imperfective variants of verbs. One solution is to avoid the duplication and mark one of the variants as a mere imperfective (or perfective, respectively) form of the other. Thus, e.g. dávat (give) can be defined as an imperfective form of dát. Some Czech dictionaries opt for duplication. However, about dávat specifically, there may be not an exact correspondence between the senses of the two forms, which may be a rationale for duplication.

Incomplete definitions for work in progress

[edit | edit source]

For a dictionary in the making (e.g. Wiktionary), it may be sometimes acceptable to provide incomplete definitions lacking some differentia (missing genus would be suspect). This may be okay as long as it is clear which sense is meant and the only question that remains is how to best design the definition. For chemical elements, one could only state the atomic number and leave further characterization for later. For a proper name, one could only state e.g. "the capital of the U.K." and leave expansion for later, e.g. to something like M-W's "city and port on both sides of the Thames River in southeastern England; capital of the United Kingdom formerly constituting an administrative county; comprises the City of London (approximately coextensive with ancient Lon*din*i*um \ län-​ˈdi-​nē-​əm How to pronounce London (audio) , lən-​ \ ; population 7400) and 32 other boroughs, which together are referred to as the metropolitan county of Greater London (area 632 square miles, or 1637 square kilometers, population 8,174,000) ". Although here, one may wonder whether M-W does not err too much on the encyclopedic side anyway. In the two cases mentioned, the definitions are not even incomplete from the logical point of view. By contrast, one could define a geographic name refering to many smaller places as e.g. "One of multiple municipalities in the U.S." and expand that later if desired.

Quasi-definitions of given names

[edit | edit source]

Definitions of given names of the form e.g. "An English given name" are arguably not definitions proper: they refer to the word itself and its function as a given name rather than providing a key for how to get to referents (a selector of referents). But it makes no sense to list all bearers of the given name, e.g. all Peters, as individual senses. A similar consideration applies to surnames, but here, one may want to list very notable bearers as senses.

Definitions of eponymic adjectives

[edit | edit source]

For eponymic -esque and -ian/-ean/-an adjectives, one has to figure out whether to only identify the person who gave name to the adjective or whether to further characterize the matter. Thus, e.g. for Kafkaesque, one can state "or or relating to novels by Kafka"; that is correct, but everything else has to be found out in an encyclopedia. Here, M-W does both[11]: "of, relating to, or suggestive of Franz Kafka or his writings
especially : having a nightmarishly complex, bizarre, or illogical quality".

To take another example, for Euripidean, M-W just places the word next to the base word, Euripides, defined as "circa 484–406 b.c. Greek dramatist". About Euripides, we learn almost nothing.

As for Orwellian, we learn a little bit more from M-W[12]:": of, relating to, or suggestive of George Orwell or his writings
especially : relating to or suggestive of the dystopian reality depicted in the novel 1984". In what way the reality was dystopian, we do not learn. Compare the English Wiktionary: "Resembling the totalitarian political methods decried in the works of writer George Orwell, particularly in the dystopian novel Nineteen Eighty-Four; characterized by use of misleading terminology, propaganda, censorship, surveillance and repression." Here, we learn a little more, without the definition thereby becoming excessively long. Needless to say, creation of such a definition/characterization is much more challenging.

A tentative reasonable approach could be this: in general, define eponymic -esque and -ian/-an adjectives merely by referring to/identifying the name-giving person, without an attempted additional characterization. In justified exceptional cases, provide a more extensive characterization. This may be the approach taken by the English Wiktionary (requires verification). This approach also seems taken by M-W as per above, although the sample taken is very small.

Definitions of agent nouns

[edit | edit source]

An agent noun such as swimmer could be defined as follows:

  • a human that swims
  • a person who swims
  • one who swims

The first definition restricts the subject to humans, but non-human persons (Martians, other extraterrestrial aliens, hobbits) can swim as well, so it is imperfect.

M-W defines swimmer as "a person or animal that swims"[13], covering also animals. M-W does not define perpetrator but merely lists it next to perpetrate[14].

Above, a rationale for the use of "person" was given, namely that there are non-human persons. However, M-W does not have a correspondingly broad sense for person.[15] For one, M-W includes the sense "2 a : an individual of a vaguely indicated group : anyone at all", with the example "one never knows", but it is not clear whether individual includes e.g. Martian individuals.

Be it as it may, personal pronouns such as who do not only refer to human persons, but also to Martians, hobbits, etc.

For presenter, the lead sense in M-W is "one who presents something : a person who formally gives or bestows something (such as an award) or who brings something before the public". M-W could have used "human" but uses "person". The M-W rationale is unclear; it is unclear whether M-W intends to include Martians under this definition.

See also An analysis of the concept of person.

The definition of human

[edit | edit source]

Let us have a look at the definition of the noun human (and German Mensch and Czech člověk) in various dictionaries, with the hope that this one case is going to reveal something about definition design.

Definitions:

  • M-W: "a bipedal primate mammal (Homo sapiens) : a person : man sense 1c—usually plural" (there is another sense, here omitted)
  • AHD: " A member of the primate genus Homo, especially a member of the species Homo sapiens, distinguished from other apes by a large brain and the capacity for speech."
  • Collins: "You can refer to people as humans, especially when you are comparing them with animals or machines."
  • German Duden: "mit der Fähigkeit zu logischem Denken und zur Sprache, zur sittlichen Entscheidung und Erkenntnis von Gut und Böse ausgestattetes höchstentwickeltes Lebewesen"
  • Czech ASSČ: "nejvyspělejší živá bytost, lidský jedinec (muž, žena nebo dítě)"; "rod a příslušník rodu z čeledi hominidů charakterizovaný zejména rozvojem mozku a ruky"

Observations about individual dictionaries:

  • M-W and AHD take a biological look and try to identify the sense with taxa.
  • M-W only identifies/characterizes the taxon Homo sapiens only by "bipedal primate mammal"; AHD, states more distinguishing characteristics.
  • Collins defines human via person.
  • Duden make no taxonomic statement or identification. Interestingly enough, Duden singles out moral capacity and the capacity to distinguish good from evil, a characteristic that a biologist probably would not pick?
  • ASSČ first singles humans out as "most advanced", a characterization that an English-speaking biologist (especially Gould) probably would not pick; it seems very non-objective/anthropocentric. ASSČ then also provides a more biological definition. It is not clear why ASSČ has two distinct definitions.

What we see above is part of the definition design space available to the definition designers. The design is not cross-culturally objectively given; rather, it reflects a certain design strategy or attitude. Moreover, for identification purposes, stating the Latin taxon name (or taxa) would suffice, but additional characterization is provided by M-W and AHD (which reminds of the consideration about chemical elements being characterized in addition to stating the atomic number).

Furthermore, we see definitions that, barring the identification with a taxon or taxa, are not going to work across possible worlds. Let us take "bipedal primate mammal". It should not be too hard to a competent designer of a modern computer game (using today's high-performance personal computers) to create a human-like creature using 3D graphics that looks distinctly non-human, perhaps some kind of orc, yet apparently is bipedal, primate and mammal. AHD's definition does not save the matter; an orc is going to have a large brain and the capacity for speech. What does save the matter are the taxa as individual entities (who Latin names are considered to be proper names) whose cross-world identity is unclear and subject to philosophical discussion. By referring to taxa, the definition designer identifies humans with certain natural kinds, thereby bringing in the philosophy of identity of natural kinds across possible worlds.

Choice of dictionary

[edit | edit source]

About the choice of the dictionary consulted: I chose M-W as a respected popular American dictionary freely available online (still copyrighted!). I could choose the British OED as another very respected dictionary, but the current online version requires a registration. I think OED2 can be found freely available online, but it is sometimes tricky to find the entries. In case of doubt, one could consult other dictionaries available from OneLook.

References

[edit | edit source]
  1. https://www.merriam-webster.com/dictionary/metric%20space
  2. https://www.merriam-webster.com/dictionary/metric
  3. https://www.britannica.com/science/metric-space
  4. https://www.merriam-webster.com/dictionary/bluen
  5. https://www.merriam-webster.com/dictionary/gold
  6. https://www.merriam-webster.com/dictionary/Darwin
  7. https://www.merriam-webster.com/dictionary/fly%20agaric
  8. https://www.britannica.com/animal/fly-insect
  9. https://www.britannica.com/animal/fish
  10. https://www.merriam-webster.com/dictionary/fish
  11. https://www.merriam-webster.com/dictionary/Kafkaesque
  12. https://www.merriam-webster.com/dictionary/Orwellian
  13. https://www.merriam-webster.com/dictionary/swimmer
  14. https://www.merriam-webster.com/dictionary/perpetrator
  15. https://www.merriam-webster.com/dictionary/person

See also

[edit | edit source]