Using the Recognizer Class
00:00
The Recognizer
class. All of the magic in SpeechRecognition happens within the Recognizer
class. The primary purpose of each Recognizer
instance is, of course, to recognize speech.
00:14 Each instance comes with a variety of settings and functionality for recognizing speech from an audio source. Creating an instance is easy. In your Python REPL, just type the following.
00:32
Each Recognizer
instance has seven methods for recognizing speech from an audio source using various APIs. These are: .recognize_bing()
, Microsoft Bing Speech; .recognize_google()
, Google Web Speech API; .recognize_google_cloud()
, Google Cloud Speech, which requires installation of the google-cloud-speech package; .recognize_houndify()
, Houndify by SoundHand; .recognize_ibm()
, IBM Speech to Text; .recognize_sphinx()
, CMU Sphinx, which requires installation of PocketSphinx; and finally, .recognize_wit()
, which uses Wit.ai.
01:14
Of the seven, only .recognize_sphinx()
works offline with the CMU Sphinx engine. The other six all require an internet connection, so keep this in mind as you work.
01:26 Due to the complexity of speech recognition, a full discussion of the features and benefits of each API is beyond the scope of this course. Since SpeechRecognition ships with a default API key for the Google Web Speech API, you can get started with it straight away.
01:43 For this reason, you’ll be using the Web Speech API in this course. The other six APIs all require authentication with either an API key or a username/password combination.
01:55 For more information, consult the SpeechRecognition documentation. An important note is that the default key provided by SpeechRecognition is for testing purposes only, and Google may revoke it at any time.
02:10 It is not a good idea to use the Google Web Speech API in production. Even with a valid API key, you’ll be limited to only 50 requests per day, and there is no way to raise this quota. Fortunately, SpeechRecognition’s interface is nearly identical for each API, so what you learn in this course will be easy to translate to a real-world project.
02:34
Each .recognize_*()
method will throw a speech_recognition.RequestError
exception if the API is unreachable. For .recognize_sphinx()
, this could happen as a result of a missing, corrupt, or incompatible Sphinx installation. For the other six methods, RequestError
may be thrown if quota limits are met, the server is unavailable, or there’s no internet connection.
02:58
With those prerequisites out of the way, let’s get our hands dirty. Go ahead and try to call .recognize_google()
in your interpreter session.
03:09 You probably got an error similar to the one onscreen, and you may well have guessed this would happen. After all, how could something be recognized from nothing?
03:19
All seven .recognize_*()
methods of the Recognizer
class require an audio_data
argument. In each case, audio_data
must be an instance of SpeechRecognition’s AudioData
class.
03:32
There are two ways to create an AudioData
instance: either from an audio file or audio recorded by a microphone. Audio files are a little easier to get started with, so let’s take a look at that in the next section.
Become a Member to join the conversation.