This paper investigates the interaction of articulatory variation and audiovisual speech perception in the Northern Cities Vowel Shift (NCVS), a pattern of sound change observed in the Great Lakes region of the United States. What are the factors that contribute to (or inhibit) diachronic sound change? While acoustically motivated sound changes are well-documented, research on the articulatory and audiovisual-perceptual aspects of sound change is limited. We relate the development of such a posture to Anglo-English speakers' exposure to labiodental variants and to the pressure to maintain a perceptual contrast between /r/ and /w/. Finally, we suggest that Anglo-English /r/ has a specific lip posture which differs from that of /w/. An articulatory-acoustic trading relation between the sublingual space and the degree of lip protrusion is proposed. These results indicate that the differing degrees of lip protrusion may contribute to maintaining a stable acoustic output across the different tongue shapes. Lip camera data reveal significantly more lip protrusion in bunched tongue configurations than retroflex ones. It is generally agreed that English /r/ may be labialised, but the exact contribution of the lips has yet to be explored. However, the number of Anglo-English speakers using exclusively tip-up variants is higher than that reported in American English across all phonetic contexts. Although traditional descriptions suggest that Anglo-English /r/ is produced using a tip-up tongue configuration, ultrasound data from 24 speakers show similar patterns of lingual variation to those reported in rhotic varieties, with a continuum of possible tongue shapes from bunched to retroflex. This paper presents acoustic and articulatory data from prevocalic /r/ in the non-rhotic variety of English spoken in England, Anglo-English. We finish with a discussion of the methodological implications of using deep learning for future analyses of phonetic data. Measurements of the lip area acquired using an artificial neural network suggest that /r/ indeed has a labiodental-like lip posture, thus providing a phonetic account for labiodentalisation. Our results suggest that there is a recognisable difference between the lip postures for /r/ and /w/, which a convolutional neural network is able to detect with a very high degree of accuracy. Techniques from deep learning were used to automatically classify and measure the lip postures for /r/ and /w/ from static images of the lips in 23 speakers. If post-alveolar /r/ is labiodental, the labial gesture for /w/, which is unequivocally considered rounded, should differ considerably. We verify this assumption by comparing the labial postures of /r/ and /w/ in Anglo-English speakers who still present a lingual component. Labiodentalisation may be due to speakers retaining the labial gesture at the expense of the lingual one, implying that /r/ is always labiodental even in lingual productions. However, the lips may be particularly important in the variety of English spoken in England, Anglo-English, because non-lingual labiodental articulations () are on the rise. The secondary labial articulation which accompanies the post-alveolar approximant /r/ in English has attracted far less attention from linguists than the primary lingual one. An audio-visual enhancement hypothesis is proposed, and the findings are discussed with regard to sound change. It is suggested that a specific labial configuration for Anglo-English /r/ encodes the contrast with /w/ visually, compensating for the ambiguous auditory contrast. Auditory ambiguity is related to Anglo-English listeners' exposure to acoustic variation for /r/, especially to, which is often confused with. However, auditory perception is ambiguous because participants tend to perceive both and as /r/. Furthermore, visual cues dominate the perception of the /r/-/w/ contrast when auditory and visual cues are mismatched. The results indicate that native Anglo-English speakers can identify and from visual information alone with almost perfect accuracy. Auditory stimuli were presented in noise.
Forty native speakers identified and stimuli in four presentation modalities: auditory-only, visual-only, congruous audio-visual, and incongruous audio-visual.
Audio-visual perception of Anglo-English /r/ warrants attention because productions are increasingly non-lingual, labiodental (e.g., ), possibly involving visual prominence of the lips for the post-alveolar approximant.
This paper investigates the influence of visual cues in the perception of the /r/-/w/ contrast in Anglo-English.