Putting It All Together

Speech Recognition With Python Darren Jones 06:51

00:37 Open your chosen code editor and create a file called guessing_game.py. Firstly, the necessary imports of random, time, and speech_recognition are needed.

00:50 The recognize_speech_from_mic() function takes a Recognizer and a Microphone instance as arguments and returns a dictionary with three keys.

00:58 The first key, "success", is a Boolean that indicates whether or not the API request was successful. The second key, "error", is either None or an error message indicating that the API is unavailable or the speech was unintelligible.

01:15 Finally, the "transcription" key contains the transcription of the audio recorded by the microphone. The function first checks that the recognizer and microphone arguments are of the correct type and it raises a TypeError if either is invalid.

01:35 Next, the function uses the microphone input as a source and firstly calls .adjust_for_ambient_noise() to calibrate the recognizer to the changing noise conditions before every recording. The .listen() method is then used to record microphone input.

01:54 The default values for the response dictionary are then set. These will be altered depending on the success or otherwise of the speech recognition.

02:04 Next, .recognize_google() is called to transcribe any speech in the recording. A try/except block is used to catch the RequestError and UnknownValueError exceptions and handle them accordingly.

02:18 The success of the API request, any error messages, and the transcribed speech are stored in the "success", "error", and "transcription" keys of the response dictionary, which is returned by the recognize_speech_from_mic() function.

02:34 You can test the recognize_speech_from_mic() function by saving guessing_game.py and running the following in an interpreter session that’s started in the same location as that file.

03:02 “Hello!”

03:09 Now, return back to your editor to complete the game code. The game itself is pretty simple. First, a list of words, a maximum number of allowed guesses, and a prompt limit are declared.

03:22 Next, a Recognizer and a Microphone instance is created and a random word is chosen from WORDS.

03:33 After printing some instructions and waiting for 3 seconds,

03:42 a for loop is used to manage each user attempt at guessing the chosen word. The first thing inside the for loop is another for loop that prompts the user, at most, PROMPT_LIMIT times for a guess, attempting to recognize the input each time with the recognize_speech_from_mic() function and storing the dictionary returned to the local variable guess.

04:07 If the "transcription" key of guess is not None, then the user’s speech was transcribed and the inner loop is terminated with break.

04:16 If the speech was not transcribed and the "success" key is set to False, then an API error occurred and the loop is again terminated with break.

04:59 The .lower() method for string objects is used to ensure better matching of the guess to the chosen word. The API may return speech matched to the word "apple" with a lower- or uppercase first letter, and either response should count as a correct answer.

05:25 If the guess was correct, the user wins and the game is terminated. If the guess was incorrect and the user has any remaining attempts, the outer for loop repeats and a new guess is retrieved.

05:38 Otherwise, the user loses the game. Next, you’ll see a few runs through the game. Firstly, one where user guesses correctly.

05:57 “Banana.” Next, a run where the user guesses incorrectly three times and loses the game.

06:10 “Banana, lemon, apple.” Finally, a run showing how the exception handling for unrecognized speech allows multiple attempts to record and recognize speech.

06:31 “Ahem. Apple, banana. Ahem. Lemon.” Now you’ve completed everything we’re covering in the course. Let’s take time to review what you’ve learned.

Become a Member to join the conversation.