According to a report on EurekaAlert, speech perception is not based solely on hearing, but also on sight and even touch. All these senses blend together to enable us to perceive what others say.
We use clues from the movement of the lips, teeth, tongue and other facial features to help use to decipher speech. If you watch a video of someone articulating one sound, e.g. “ba”, combined with a recording of them saying a different sound, e.g. “ka”, you will probably perceive the sound they are articulating, rather than the one on the recording, or something in between the two sounds, such as “da”. This is known as the McGurk Effect. This happens even if you know what sounds are different and concentrate on the audio.
Here’s an example:
According to the research, the McGurk Effect is evidence that the senses are inextricably integrated, and that the brain perceives the acoustic and visual signals of speech as part of a single system. Other studies have found that this link is established even before young children are able to perceive individual phonemes.