Welcome to the tutorial on implementing speech recognition using Python! This guide will walk you through the basics of converting spoken words into text using popular libraries and tools.
Getting Started 🚀
Install Required Libraries
Start by installing theSpeechRecognition
library andpyaudio
for audio processing:pip install SpeechRecognition pyaudio
💡 Note: You may need to install additional dependencies like PortAudio for
pyaudio
to work on some systems.Record Audio Input
Usepyaudio
to capture audio from your microphone:import pyaudio import wave # Audio recording setup FORMAT = pyaudio.paInt16 CHANNELS = 1 RATE = 44100 RECORD_SECONDS = 5 FILE_NAME = "output.wav" audio = pyaudio.PyAudio() stream = audio.open(format=FORMAT, channels=CHANNELS, rate=RATE, input=True, frames_per_buffer=1024) print("Recording...") frames = [] for _ in range(0, int(RATE * RECORD_SECONDS)): data = stream.read(1024) frames.append(data) print("Finished recording.") stream.stop_stream() stream.close() audio.terminate() # Save the recorded data wf = wave.open(FILE_NAME, 'wb') wf.setnchannels(CHANNELS) wf.setsampwidth(audio.get_sample_size(FORMAT)) wf.setframerate(RATE) wf.writeframes(b''.join(frames)) wf.close()
Transcribe Audio with SpeechRecognition
Load the audio file and use Google Web Speech API for transcription:import speech_recognition as sr r = sr.Recognizer() with sr.AudioFile("output.wav") as source: audio_data = r.record(source, duration=5) try: text = r.recognize_google(audio_data) print("Transcribed Text:", text) except sr.UnknownValueError: print("Could not understand audio") except sr.RequestError: print("Could not request results")
Advanced Tips 🔍
- Microphone Input: Replace
"output.wav"
with live microphone input by adjusting theAudioFile
context. - Alternative APIs: Explore other engines like
recognize_sphinx
for offline processing orrecognize_bing
for different services. - Customization: Adjust parameters like
RATE
,RECORD_SECONDS
, or usepydub
to convert audio formats.
Further Learning 📚
- Python Documentation for SpeechRecognition
- Audio Processing with PyAudio
- Speech Recognition Use Cases