Welcome to the tutorial on implementing speech recognition using Python! This guide will walk you through the basics of converting spoken words into text using popular libraries and tools.

Getting Started 🚀

  1. Install Required Libraries
    Start by installing the SpeechRecognition library and pyaudio for audio processing:

    pip install SpeechRecognition pyaudio
    

    💡 Note: You may need to install additional dependencies like PortAudio for pyaudio to work on some systems.

  2. Record Audio Input
    Use pyaudio to capture audio from your microphone:

    import pyaudio
    import wave
    
    # Audio recording setup
    FORMAT = pyaudio.paInt16
    CHANNELS = 1
    RATE = 44100
    RECORD_SECONDS = 5
    FILE_NAME = "output.wav"
    
    audio = pyaudio.PyAudio()
    stream = audio.open(format=FORMAT, channels=CHANNELS,
                        rate=RATE, input=True,
                        frames_per_buffer=1024)
    print("Recording...")
    frames = []
    for _ in range(0, int(RATE * RECORD_SECONDS)):
        data = stream.read(1024)
        frames.append(data)
    print("Finished recording.")
    stream.stop_stream()
    stream.close()
    audio.terminate()
    
    # Save the recorded data
    wf = wave.open(FILE_NAME, 'wb')
    wf.setnchannels(CHANNELS)
    wf.setsampwidth(audio.get_sample_size(FORMAT))
    wf.setframerate(RATE)
    wf.writeframes(b''.join(frames))
    wf.close()
    
  3. Transcribe Audio with SpeechRecognition
    Load the audio file and use Google Web Speech API for transcription:

    import speech_recognition as sr
    
    r = sr.Recognizer()
    with sr.AudioFile("output.wav") as source:
        audio_data = r.record(source, duration=5)
        try:
            text = r.recognize_google(audio_data)
            print("Transcribed Text:", text)
        except sr.UnknownValueError:
            print("Could not understand audio")
        except sr.RequestError:
            print("Could not request results")
    

Advanced Tips 🔍

  • Microphone Input: Replace "output.wav" with live microphone input by adjusting the AudioFile context.
  • Alternative APIs: Explore other engines like recognize_sphinx for offline processing or recognize_bing for different services.
  • Customization: Adjust parameters like RATE, RECORD_SECONDS, or use pydub to convert audio formats.

Further Learning 📚

Speech_Recognition
Python_Code