Understanding the Effect of Noise

Speech Recognition With Python Darren Jones 04:37

Here are some resources for more information about topics covered in this lesson:

00:00 The effect of noise on speech recognition.

00:05 Noise is a fact of life. All audio recordings have some degree of noise in them, and un-handled noise can wreck the accuracy of speech recognition apps.

00:15 To get a feel for how noise can affect speech recognition, download the jackhammer.wav file. As always, make sure you save this to your interpreter session’s working directory. This file has the phrase “the stale smell of old beer lingers” spoken with a loud jackhammer in the background.

00:33 “The stale smell of old beer lingers.” What happens when you try to transcribe this file?

00:58 As you can see, it’s way off. So, how can you deal with this? One thing you can try is using the .adjust_for_ambient_noise() method of the Recognizer class.

01:25 That got you a little closer to the actual phrase, but it still isn’t perfect. And also, 'the' is missing from the beginning of the phrase. Why is that?

01:35 The .adjust_for_ambient_noise() method reads the first second of the file stream and calibrates the recognizer to the noise level of the audio.

01:42 Hence, that portion of the stream is consumed before you call .record() to capture the data. You can adjust the time-frame that .adjust_for_ambient_noise() uses for analysis with the duration keyword argument.

01:56 This argument takes a numerical value in seconds and is set to 1 by default. Try lowering this to 0.5.

02:19 It looks like that has allowed SpeechRecognition to pass the entire spoken phrase for recognition, but now you’re back to where you were before with a transcription that’s the same as if noise adjustment hadn’t been used.

02:32 Sometimes it just isn’t possible to remove the effect of the noise. The signal is just too noisy to be dealt with effectively, and that’s the case with this file.

02:42 If you find yourself running up against these issues frequently, you may have to resort to some pre-processing of the audio. This can be done with audio editing software or a Python package such as SciPy that can apply filters to the files.

02:56 A detailed discussion of this is beyond the scope of this course, but check out Allen Downey’s Think DSP book if you’re interested in it. For now, just be aware the ambient noise in an audio file can cause problems and must be addressed in order to maximize the accuracy of speech recognition.

03:14 When working with noisy files, it can be helpful to see the actual API response. Most APIs return a JSON string containing many possible transcriptions.

03:24 The .recognize_google() method will always return the most likely transcription unless you force it to give you the full response. You can do this by setting the show_all keyword argument of the .recognize_google() method to True.

03:46 As you can see, .recognize_google() returns a dictionary with the key 'alternative' that points to a list of possible transcripts.

03:53 The structure of this response may vary from API to API and is mainly useful for debugging. By now, you have a pretty good idea of the basics of the SpeechRecognition package.

04:05 You’ve seen how to create an AudioFile instance from an audio file and how to use the .record() method to capture data from the file. You learned how to record segments of a file using the offset and duration keyword arguments of .record(), and you experienced the detrimental effect noise can have on transcription accuracy.

04:25 Now for the fun part! In the next section, you’ll see how to transition from transcribing static audio files to making your project interactive by accepting input from a microphone.

Become a Member to join the conversation.