Psycholinguistics/Components of Speech

Introduction

In order to adequately understand the processes of speech, as well as the many theories regarding speech perception and word recognition, it is necessary to first understand the components at the root of those theories. These components, which are discussed in detail below, include:

Prosody
Phonological Rules
Cross-Linguistic Phonetics

Human language is segmental in nature, and can be broken down into several different units that be identified by physical (using acoustic or articulatory change points) or auditory means (using changes in the quality or quantity typically related to phonemic changes)^[1]. When combined, the segments of language can produce more complex and abstract meanings. These segments, in order from least to most complex, are:

- Phone – any sound in human language and therefore are considered to be the minimal unit of language^[1].
- Phoneme – phonological units which are used to make meaningful contrasts between segments. The set of phonemes used by speakers varies from language to language, with each language having its own specific set of phonemes^[1].
- Cluster – groupings of vowel and consonants. Clusters follow the phonological rules of a language^[1].
- Syllable – a specific language unit that consists of a syllabic element called a nucleus, and all other segments associated with it. The nucleus typically refers to a vowel, while those consonants that precede it are referred to as the onset and those following the vowel are referred to as the coda. Multiple syllables can be present in a word and many people are familiar with the common ‘tests’ for assessing how many syllables a word is composed of^[1].
- Morpheme – the smallest unit of meaning in a language. Multiple morphemes may exist in a single word. For example, affixes are considered to be morphemes. For this reason, some morphemes can stand alone, while others such as –s and other suffixes do not^[1].
- Word – the smallest free form unit in language, or in other words the smallest unit in language that, on its own, has pragmatic and/or semantic value^[1].
- Phrase – a single word or collection of words that create one unit in a clause. There are multiple types of phrases that are categorized on the basis of how they are structured^[1].
- Clause – the structure of a clause can vary from language to language. In English, a clause consists of a subject and a predicate and therefore makes up one complete thought in a sentence. Two main types of clauses exist: independent and dependent clauses. Independent clauses can stand alone in a sentence whereas, without the presence of an independent clause, a dependent clause sounds incomplete^[1].
- Sentence – consists of one or more clauses. Simple sentences can consist of one clause (an independent clause), while more complex can be composed of up to several clause (both independent and dependent)^[1].

The components of speech work amongst these units of speech to provide each language with a distinct sound and identifiable qualities. These interactions are discussed more fully below.

Prosody

When listening to computer generated speech systems, people often note that the speech sounds awkward and unnatural (for an example click here). This synthetic speech lacks a human quality that can be partly, if not mostly, attributed to its lack of prosodic cues. Prosodic cues are those qualities of human speech, above the level of the segment, that can enhance the meaning of a speaker’s utterance simply by the way a speaker delivers the utterance. Therefore, the information communicated by prosodic cues is beyond the words chosen to communicate a specific thought. For instance, if a speaker slows the rate of their speech and adds emphasis to each word, it may be inferred that the speaker is being sarcastic. Sarcasm, irony and emphasis are commonly communicated through prosodic cues.

Suprasegmentals

Prosodic cues are a suprasegmental property, which refers to a feature of language that may extend across the specific segments listed above, which in turn affects the meaning of an utterance, or even a whole word, phrase, or sentence. The suprasegmental properties of speech include:

pause
stress (accent)
pitch
intonation
rate^[2]

Pause

The easiest of these properties to understand are believed to be pause and word stress. These features act as an indication of breaks or segments in continuous speech, which encourages easy comprehension of speech. In relation to the segments of speech, pauses are commonly located between clauses or sentences – the idea of pausing at commas or periods while reading is a great example of this. Observing pauses at these points allows for greater comprehension of what you are reading and divides long strings of words into manageable and understandable components. It should be noted that by following this line of thought, it would seem that dividing smaller segments of language would not aid in comprehension. For example when reading a passage from a book, one’s comprehension would be greatly impeded if pauses were made between each syllable of a word – or worse yet after each phoneme!

Stress/Accent

A much less obvious prosodic feature is stress or accent. Accent, in this case, does not relate to language dialect and instead refers to stress of a syllable. In general, stress (accent) refers to a combination of the loudness, length, or pitch of syllable in a word, or a word in a phrase/sentence. When stress is placed on a syllable, the length, loudness, and/or pitch of that syllable is increased when compared to the same syllable unstressed. The stress on any particular syllable can cause drastic changes in the meaning of a word. For example, the word 'record' in English can be said two ways and the placement of syllable stress determines the meaning of each the word. If the stress is placed on the last syllable of the word (i.e. record’), the word may act as a verb and mean to document or keep track of something. Comparatively, if stress is placed on the first syllable of the word (i.e. rec’ord), the word may act as a noun and refer to a tally of some sort or music disc.

Pitch/Intonation

Variations in speech can also be created by pitch and, in turn, intonation. While pitch refers to the vibration frequency in a person’s vocal cords when speaking, intonation refers to the changes in pitch over time. For example, more vibration in a speaker’s vocal cords results in a higher pitch and it is a shift to higher pitch at the end of a speaker’s sentence that often indicates that a question is being asked (intonation).

Pitch can vary across most aspects of speech from phonemes to words to sentences, causing subtle to notable changes in the interpretation of a speaker’s speech. On the other hand, changes in intonation are commonly found across a whole phrase or sentence, as illustrated in the example above. However, changes in intonation can also be found across single words and syllables. This fact can be demonstrated by the ability of a speaker to ask a one word question.

Rate

Finally, the rate of speech refers to the number of phonemes or words produced by a speaker or perceived by a listener per second. The rate of speech by the speaker can directly influence the understanding by the speaker’s listeners. Rate of speech and enunciation have a reciprocal relationship, meaning that as the rate of speech increases, enunciation decreases – or vice versa. When a speaker’s ability to enunciate clearly decreases, the listener’s ability to pick up on other cues also decreases, which in turn can negatively affect a listener’s comprehension.

It is important to realize that often these properties co-occur, in particular with pitch, stress, and timing variation often existing simultaneously^[2]. For example, in some language stressed syllables have higher or lower pitches than do unstressed syllables.

Gathering More Information: Emotional Meaning

Prosodic information helps to provide listeners with information that allows for insight into the direct meaning of a statement, whether it is a word, phrase, or even an utterance, as well as information regarding the speaker’s emotional meaning. Emotional meaning from prosodic information may give further insight to a speaker’s statement, reflecting the emotional state of the speaker or the emotional connotation of the statement. For example the rate of a speaker’s speech can act as a cue to infer emotional meaning. When a speaker has an increased rate of speech in comparison to that which is typical of them, a listener may infer that the speaker is anxious. Conversely, if the speaker’s rate of speech is much slower than usual a listener may infer that the speaker is sad or depressed.

Phonemes

While similar prosodic cues are present across languages, each language has its own distinct set of phonemes that comprise it. A phoneme is not simply a physical sound and should not be confused with a phone. A phoneme is a distinctive sound segment in a language that is used to demonstrate contrast in a language. The convention in linguistics is to denote a phoneme between two slashes. For example, /j/ is a phoneme. On the other hand, a phone is any sound in a language and is considered the smallest unit that can be identified in speech. Phones are denoted between a set of square brackets. For example, [j] is a phone.

The contrastive nature of phonemes allow for changes in meaning to be noted by the slightest change in a phoneme. This distinction is made by the brain of a listener who has had experience with the language they are engaging in and from that experience it takes little effort for the brain to detect slight changes.

Minimal Pairs

When two words are identical, except for one phone that is located in the same place in both of the words, the noted difference is called a minimal pair^[3]. In other words, a minimal pair is a test that proves that two phonemes are contrastive and not simply variants of the same phoneme. For example, in English an example of a minimal pair are the words ‘hut’ and ‘hat.’ These two words are identical in every way except for the phoneme /u/ in ‘hut’ and /a/ in ‘hat.’ By making the simple change in the middle segment of the word, the meaning of the whole word changes.

Allophones

A group of phones that produce varying sounds, yet do not change the meaning of a word are called allophones^[3]. This definition means that different sounds, or phones, can make up on phoneme and therefore allow it to be pronounced – potentially – in more than one way. For example, when the ‘p’ in the word 'pin' is aspirated (or formed with a puff of air) the phone is denoted by [ph]. However, when unaspirated the phone is denoted simply as [p]. Although in the context of the word ‘spin,’ when the phone changes from [p] to [ph] the meaning of the word remains the same indicating that they are allophones and are classified as a single phoneme.

Phonological Rules

Phonology is the set of sounds in a language composed of phonetic and phonemic units. The distinction between phonetic and phonemic units can be made based on the distinction listed above between phones and phonemes. Phonology also includes the rules that guide the combination of these units.

In each language, phonemes can be predicted and understood using a set of phonological rules. These rules vary between languages and reflect the regularities of that particular language. Through experience a speaker internalizes the implicit rules specific to a given language. Though an infinite number of combinations of phonemes can be created, the combinations which are considered acceptable and are used in a language is significantly less. For example, the word 'shime' does not actually exist in English even though it does not violate any phonological rules.

The specific rules that outline the permissible combinations of phonemes are called phonotactic rules. These rules specifically apply to syllables, as well as more generally to words. As a result, word phonotactics are based on syllable phonotactics. Similarly, multisyllabic words follow both syllable phonotactic rules and word phonotactic rules.

An example of an impermissible phoneme combination is /stl/. No word in the English language includes such a combination and if asked to pronounce the combination, a native English speaker would most likely hesitate before pronouncing the combination of sounds. When spoken, the sounds would likely feel and sound unnatural to the speaker because even though the technique of ‘sounding out’ the letters individually can be applied, the combination is not present in English and would therefore be unfamiliar to the speaker.

The fact that these specific rules vary from language to language indicates the importance of learning of phonological rules in the acquisition of a new language. The uniqueness of each language’s phonological rules and corresponding combination demonstrate the importance of experience with a language in order to reduce the number of phonological errors made.

Cross-Linguistic Phonetics

One does not need to be a psycholinguist to recognize the different sounds that occur across languages. Those who have ever attempted to learn a new language may be especially aware of these distinctions. When a native English speaker attempts to learn French or Spanish they may feel that the ‘rolling’ of sounds off the tongue feels or sounds unnatural. Similarly with other languages, a native English speaker may find the creation of sounds at the back of the throat unfamiliar as well. These differences are an indication of the importance placed on certain sounds, or phonemes, in one language and, in certain cases, not in another language.

In each language there are a set of phonemes which are deemed to be meaningful and purposeful, while others are considered to be unimportant. In English, this can be demonstrated by what people consider to be purposeful speech and what is considered to be babble or nonsense. As mentioned above, phonological rules determine the combination of permissible phonemes, however not all permissible phonemes are actually used in a language. For instance, the combination “baba” on its own has no meaning in English and as a result would be ignored or disregarded by an English listener. On the other hand, this combination of sounds is purposeful in languages such as Mandarin and therefore would not be ignored by a Mandarin speaking listener.

The prominence of particular sounds, or lack thereof, makes each language unique and interesting. However, if misunderstood these differences can and have been made the basis of hurtful stereotypes. For example, it has become ‘known’ in Western popular culture that native speakers of Asian languages cannot produce the ‘l’ sound in English. This ‘inability’ can lead to embarrassing situations where the intended meaning of the speaker is lost to mispronunciation. This common belief is very much a misunderstood phenomenon of cross-linguistic phonetics and an unfortunate over-generalization.

It is true that native Korean and Japanese speakers have some difficulty producing the ‘liquid sounds’ of English, such as ‘l.’ However, this rule does not extend to all Asian languages or all speakers. The struggle to produce these sounds are also not a reflection of the speaker themselves, rather it is a reflection of the phonemes that are valued in the speaker’s native language. In Korean and Japanese, the ‘liquid sound’ of ‘l’ is not produced and instead ‘l’ is produced similar to how ‘r’ is produced in English. Therefore, as speakers transition from language to language, it is not an inability to produce a given sound that causes the difference in phoneme production. Rather, these differences result from years of experience with the phonological rules of one’s native or more primary language, which then influence the way other languages are used.

Though there are distinct differences that occur in phonetics across languages, there are also sounds that occur in most, and sometimes all, of the world’s languages. These similarities across diverse languages are speculated to result from the limitations of the human articulatory and auditory systems. As a result of these limitations some phones and combinations of phones become favoured across speakers of different languages and are therefore used more readily.

American Sign Language

When considering the components of speech, it is common for people to fail to consider the components of sign language. This may seem to be a logical conclusion – speech is a stream of spoken words and its components are features of this spoken stream. However, American Sign Language (ASL), and other forms of sign language, can be broken down into components very similar to those of spoken speech. Furthermore, it has been argued that sign languages, ASL in particular, can be broken into segments or units in a way similar to the division of spoken language. Researchers have gone so far as to state that sign language is a system that possesses levels that correspond to the phonological, morphological, and semantic, syntactic, and pragmatic levels of spoken language^[4]. For example, the finger-spelled English words that exist in sign language are viewed as a grouping of one or more morphemes that come together to produce a single sign.

Properties that function similar to the suprasegmental properties of spoken language are said to operate across the segments of sign language. The suprasegmental properties of sign language include body language and facial expression, which can occur across one or more segments of a sign^[4]. These features of sign language have been said to operate similar ways as do pitch and stress with the segmental phonemes of spoken language.

ASL Terminology

Though the similarities between sign language and spoken language seem apparent, it should be noted that the obvious differences between spoken and sign language called for a set of new terminology for referring to the comparable terms that applied to spoken language.

In sign language, the equivalent of phonology is cherology^[4]. In turn, the phonemic units have been termed cheremes and the sign units that represent phonetic elements that are present in spoken language are termed primes^[4].

The concept of cherology was developed in the mid 1960’s by Dr. William Stokoe, a professor at Gallaudet University – an educational facility for the deaf and hard of hearing. In a time when little was known about ASL and therefore how to perceive it, Dr. Stokoe authored and co-authored several influential books on the topic including Sign Language Structure and A Dictionary of American Sign Language on Linguistic Principles.

To explain the contrastive units (cheremes) that exist in sign language, distinctive parameters were set to clarify contrasts. These contrasts are determined by differences in:

hand configuration
place or location of sign in relation to the signer
direction of movement of the sign in relation to the signer or in the hand configuration
orientation of the sign^[4]

These features of a sign serve to distinguish between the meanings of signs and can even be looked at in a way similar to minimal pairs. However to avoid the traditional English definitions of these parameters being considered to be the exclusive definitions in the current context, these distinctive parameters were provided with a distinct name^[4].

- ‘location’ is termed Tabula (Tab)
- ‘hand configuration’ is termed Designator (Dez)
- ‘movement’ is termed Signation (Sig)^[4]

When signs in ASL differ only on one contrastive aspect the two can be considered to be a minimal pair. For example, in ASL the signs for mother and father are very much the same and differ only by the location of the sign in relation to the signer.

Similar to spoken language, there are also a set of phonological rules that a signer must learn to adhere to through experience with the language^[4]. Furthermore, these phonological rules tend to vary depending on the specific sign language being considered. In American Sign Language, these phonological rules include the fact that one hand is considered the dominant hand and can take on any shape, while the hand considered to be non-dominant can only take on one of seven basic shapes. Another phonological rule of ASL states that the non-dominant hand can either take on one of these seven basic shapes (creating a dominance condition) or match the shape of the dominant hand (creating a symmetry condition). Finally, as is the case in spoken language, some combinations of primes are not permissible in sign language and are unused.

Debate

The many similarities between sign language and spoken language can serve as an indication that sign language is no less complex or rich than spoken language. As blatant as the similarities between these two forms of language may seem, some debate exists surrounding focus on them. Some academics have argued that by simply drawing an analogy between ASL and spoken language, the important differences that exist between the languages are overlooked^[4].

While it is important to appreciate the uniqueness of all languages and to investigate the components of each language independently of one another, the analogy between the two languages plays an important role in the understanding of language. By drawing a comparison to a language which is familiar to a speaker or signer, it helps to promote greater appreciation for the languages of others. Also, the comparison between languages provides valuable insight into evolution of a language and the components which humans find necessary or helpful in speech and comprehension.

Learning Exercise

This learning exercise is designed to provide added clarity to the information presented in the chapter. It can be used to check the knowledge gained after reading the chapter or as a hands activity to aid in understanding. The answer key for the exercise is can be found here.

Part 1

Before you begin the rest of the exercise, it is important that you clearly have the defintions of a minimal pair and an allophone clear.

1. a) Using your own words, provide the definition of a minimal pair. Make sure to provide all of the main characteristics.

b) Following the same procedure outlined in a), provide the definition of an allophone.

In the following pairs identify if the words or signs are allophones, minimal pairs, or neither. If the pair is identified as a minimal pair, indicate the single difference (distinctive feature) that occurs in the minimal pair. As indicated above, an example in English may be the phoneme /u/ in ‘hut’ and /a/ in ‘hat,’ while in ASL an example may be the place or location of sign mother or father in relation to the signer.

2. Catch - Cash

3. Persist - Missed

4. Liege – Lease

5. Messed – Confessed

6. ‘p’ – punctuate/compromised

7. Soot - Suit

8. /l/ – let/tell

9. Match - Attach

10. Beige - Bays

Part 2 - American Sign Language (ASL)

As mentioned in the chapter, American Sign Language (ASL) has distinctive features that may vary between signs to create a minimal pair.

1. Fill out the table below with the distinctive features that exist between signs. Also list the ASL terms for these distinctive features.

Distinctive Features	ASL Term

2. Use your knowledge of minimal pairs in ASL and the following three ASL signs to fill out the corresponding cells in the table below. The first example is filled in for you.

The American Sign Language sign for 'boy.'

The American Sign Language sign for 'woman.'

The American Sign Language sign for 'man.'

	ASL sign for 'boy'	ASL sign for 'woman'	ASL sign for 'man'
Location	forehead
Movement	touch
Palm Orientation	down

a) Which pair of signs, if any, are minimal pairs? b) What distinctive feature makes these signs a minimal pair?

Using the same general procedure as outlined in Question 2, identify whether or not the following signs have a minimal pair. A table does not need to be constructed, however it may aid you in identifying the existence of minimal pairs.

3. ASL signs Home – Yesterday

American Sign Language sign for 'home.'

American Sign Language sign for 'yesterday.'

4. ASL signs Soon – Train

ASL sign for 'soon.' ASL sign for 'train.'

5. ASL signs Know – Think

American Sign Language sign for 'know.'

American Sign Language sign for 'think.'

For more American Sign Language (ASL) resources check out Lifeprint.com. All images belong to Lifeprint.com.

Footnotes

↑ ^1.00 ^1.01 ^1.02 ^1.03 ^1.04 ^1.05 ^1.06 ^1.07 ^1.08 ^1.09 Crystal, D., (2003). A Dictionary of Linguistics & Phonetics, Blackwell Publishing Ltd., Oxford, UK.
↑ ^2.0 ^2.1 Hoyte, K. J., Brownell, H., & Wingfield, A. (2009). Components of speech prosody and their use in detection of syntactic structure by older adults. Experimental Aging Research, 35, 129–151.
↑ ^3.0 ^3.1 Hyman, Larry M. 1975.Phonology. New York: Holt, Rinehart, and Winston.
↑ ^4.0 ^4.1 ^4.2 ^4.3 ^4.4 ^4.5 ^4.6 ^4.7 ^4.8 Stokoe, W. C., Jr. (2005). Sign language structure: an outline of the visual communication systems of the American deaf, Journal of Deaf Studies and Deaf Education, 10(1), 3-37.

References

Burquest, Donald A., and David L. Payne. 1993.Phonological analysis: A functional approach. Dallas, TX: Summer Institute of Linguistics.

Crystal, D., (2003). A Dictionary of Linguistics & Phonetics, Blackwell Publishing Ltd., Oxford, UK.

Hessler, D., Jonkers, R., & Bastiaanse, R. (2010). The influence of phonetic dimensions on aphasic speech perception. Clinical Linguistics & Phonetics, 24(12), 980–996.

Hoyte, K. J., Brownell, H., & Wingfield, A. (2009). Components of speech prosody and their use in detection of syntactic structure by older adults. Experimental Aging Research, 35, 129–151.

Hyman, Larry M. 1975.Phonology. New York: Holt, Rinehart, and Winston.

Seitz, A. R., Protopapas, A., Tsushima, Y., Vlahou, E. L., Gori, S., Grossberg, S., & Watanabe, T. (2010). Unattended exposure to components of speech sounds yields same benefits as explicit auditory training, Cognition, 115, 435–443.

Stokoe, W. C., Jr. (2005). Sign language structure: an outline of the visual communication systems of the American deaf, Journal of Deaf Studies and Deaf Education, 10(1), 3-37.

[multiple-1] 1.00 ^1.01 ^1.02 ^1.03 ^1.04 ^1.05 ^1.06 ^1.07 ^1.08 ^1.09 Crystal, D., (2003). A Dictionary of Linguistics & Phonetics, Blackwell Publishing Ltd., Oxford, UK.

[Hoyte-2] 2.0 ^2.1 Hoyte, K. J., Brownell, H., & Wingfield, A. (2009). Components of speech prosody and their use in detection of syntactic structure by older adults. Experimental Aging Research, 35, 129–151.

[Hyman-3] 3.0 ^3.1 Hyman, Larry M. 1975.Phonology. New York: Holt, Rinehart, and Winston.

[Stokoe-4] 4.0 ^4.1 ^4.2 ^4.3 ^4.4 ^4.5 ^4.6 ^4.7 ^4.8 Stokoe, W. C., Jr. (2005). Sign language structure: an outline of the visual communication systems of the American deaf, Journal of Deaf Studies and Deaf Education, 10(1), 3-37.

[1]

[2]

[3]

[4]