Join us and get access to thousands of tutorials and a community of expert Pythonistas.

This lesson is for members only. Join us and get access to thousands of tutorials and a community of expert Pythonistas.

Working With Microphones

Speech Recognition With Python Darren Jones 07:29

Here are some resources for more information about topics covered in this lesson:

00:00 Working with microphones. To access your microphone with SpeechRecognition, you’ll need to install the PyAudio package. Close the REPL you’ve been using, and let’s take a look at the steps needed to install PyAudio.

00:16 The process for installing PyAudio will vary depending on your operating system. On a Debian-based Linux like Ubuntu, you can install PyAudio with apt, as seen onscreen.

00:32 Once installed, you may still need to run pip install pyaudio, especially if you’re working in a virtual environment.

00:47 For macOS, first you’ll need to install PortAudio using Homebrew, and then install PyAudio with pip.

01:00 Installing on Windows can be a little involved, as the standard pip install method doesn’t work on Python 3.7 and above, and attempting an install will lead to a number of errors of missing components on your system. However, the Unofficial Windows Binary page offers wheels which can be downloaded and installed using the command seen onscreen.

01:30 As with the installation of any software, you need to ensure that you’re happy that the software you’re installing is legitimate, but these are packages I’ve personally used on a number of projects involving Windows where no other solution was available, and they’ve worked well.

01:45 Once you’ve got PyAudio installed, you can test the installation from the console. Make sure your default microphone is on and unmuted. If the installation worked, you should see something like this.

02:01 “Hello from Real Python.” The microphone is working correctly. The microphone works. Go ahead and play around with it a little bit by speaking into your microphone and seeing how well SpeechRecognition transcribes your speech.

02:19 If you’re on Ubuntu and get some funky output like ALSA lib […] Unknown PCM, refer to the troubleshooting page of SpeechRecognition’s documentation for tips on suppressing these messages.

02:31 This output comes from the ALSA package installed with Ubuntu—not SpeechRecognition or PyAudio. In all reality, these messages may indicate a problem with your ALSA configuration, but in my experience, they don’t impact the functionality of your code.

02:46 They’re mostly a nuisance. The Microphone class. Open up a new Python REPL and create an instance of the Recognizer class.

03:03 Now, instead of using an audio file as the source as seen previously, you’ll use the default system microphone. You can access this by creating an instance of the Microphone class.

03:18 If your system has no default microphone, such as on a Raspberry Pi, or you want to use a microphone other than the default, you’ll need to specify which one to use by supplying a device index.

03:29 You can get a list of microphone names by calling the .list_microphone_names() static method of the Microphone class.

03:41 Note that your output may differ from what’s seen onscreen. The device index of the microphone is the index of its name in the list that’s returned by .list_microphone_names().

03:53 Given the output you’ve seen onscreen, if you wanted to use the 'BlackHole 2ch' microphone, which has index 2 in the list, you’d create a Microphone instance using the code seen onscreen now. For most projects, though, you’ll probably want to use the default system microphone.

04:09 Using .listen() to capture microphone input. Now that you’ve got a microphone instance ready to go, it’s time to capture some input. Just like the AudioFile class, Microphone is a context manager. You can capture input from the microphone using the .listen() method of the Recognizer class inside of the with block.

04:30 This method takes an audio source as its first argument and records input from the source until silence is detected. Once you execute the with block, try saying “hello” into your microphone.

04:44 Wait a moment for the interpreter prompt to display again. Once the prompt returns, you’re ready to recognize the speech.

04:57 If the prompt never returns, your microphone is most likely picking up too much ambient noise. You can interrupt the process with Control + C to get your prompt back.

05:08 To handle ambient noise, you’ll need to use the .adjust_for_ambient_noise() method of the Recognizer class, just like you did when trying to make sense of the noisy audio file.

05:18 Since input from a microphone is far less predictable than an input from an audio file, it’s a good idea to do this any time you listen for microphone input.

05:31 After running this code, wait a second for .adjust_for_ambient_noise() to do its thing, and then try speaking “hello” into the microphone. Again, you’ll have to wait a moment for the interpreter prompt to return before trying to recognize the speech.

05:48 Recall that .adjust_for_ambient_noise() analyzes the audio source for one second. If this seems too long to you, feel free to adjust this with the duration keyword argument.

06:00 The SpeechRecognition documentation recommends using a duration of no less than 0.5 seconds. In some cases, you may find that durations longer than the default of 1 second generate better results.

06:13 The minimum value you need depends on the microphone’s ambient environment. Unfortunately, this information is typically unknown during development. In my experience, the default duration of 1 second is adequate for most applications.

06:28 Handling unrecognizable speech. Try typing the previous code example into the interpreter and making some unintelligible noises into the microphone. You should get a response similar to seen onscreen. Audio that cannot be matched to text by the API raises an UnknownValueError exception.

06:53 You should always wrap calls to the API with try and except blocks to handle this exception. You may have to try harder than you’d expect to get the exception thrown.

07:04 The API works very hard to transcribe any vocal sounds, and even short grunts were transcribed as words like "how" for me. Coughing, hand claps, and tongue clicks would consistently raise the exception.

07:17 So far, you’ve seen how to recognize speech in English, but what if the speech you need to recognize isn’t in English? You’ll see how to deal with that situation in the next video.

Become a Member to join the conversation.