top of page

WEEK 5: BEGIN CODE

  • Lauren Reyes
  • Feb 11
  • 4 min read

This post will talk about the beginning of the coding process, primarily executed on my local laptop.


LEARNING POINTS

  • Virtual environments

  • Dependencies & libraries

  • Understanding audio playback in python


SETTING UP DEVELOPMENT ENVIRONMENT

  1. I installed Python through the VS Code terminal (though I thought it was already installed from previous projects).

  2. I had a lot of trouble running code because of these two commands:

python --version
python3 --version
  1. I realize now that I had to utilize python3 in order for my code to work.


INSTALLING REQUIRED LIBRARIES

  1. When trying to install libraries and dependencies (ChatGPT recommended) for this project, the terminal prompted use of virtual environments. I had no idea what this meant at the time, but it became crucial in running the code properly.

  2. Using pip:

    1. gTTS: converts text to speech

    2. speech_recognition: converts speech to text

    3. pyaudio: allows program to listen to a microphone

      1. This originally did not install at first, so I had to install "pipwin" as PyAudio is system-specific and depends on this.


RUNNING CODE

  1. Ran this code:

python rotary_phone_test.py
  1. It was not taking in any audio at first, so I included a print statement to see. I realized that it wasn't that it wasn't working, but it was only hearing noise instead of words.

  1. Increased ambient noise adjustment

  2. Increased listening time to account for timeout issue

  3. Debugged Speech Matching Logic

    1. Added print statement to see what I was saying

    2. Kept saying my text was wrong despite saying the word correctly.

  4. Fixed Audio Playback Issue

    1. Did not originally have mpg321 installed, which is necessary for gTTS to play generated speech

    2. I had to "brew" install this rather than sudo

      1. This took a while to realize, as I could use sudo before, until I realized I had only used sudo while ssh'ed into another computer that was Linux based.


FINAL WORKING CODE

import speech_recognition as sr
from gtts import gTTS
import random
import os
import time

# Function to simulate dialing input (local test)
def prompt_for_dial_number():
    print("Please dial a number by entering it on the keyboard (1-3):")
    dial_number = input("Dial Number: ")
    
    if dial_number not in ['1', '2', '3']:
        print("Invalid number. Try again.")
        return prompt_for_dial_number()  # Restart if invalid input
    
    print(f"Dialed Number: {dial_number}")
    return dial_number

# Function to generate a response based on personality
def generate_response(personality, user_input):
    responses = {
        'friendly': [
            "That's awesome! What else can I help with?",
            "I'm so happy you called. What's on your mind?",
            "You're in a good mood today, huh? What do you want to talk about?"
        ],
        'rude': [
            "Ugh, really? What do you want now?",
            "I don't care. What else do you want?",
            "You're wasting my time, but fine, what's up?"
        ],
        'indifferent': [
            "Yeah, sure, whatever. What now?",
            "I'm here, just tell me what you need.",
            "Do you even have a point? Spit it out."
        ]
    }
    
    # Default response for unexpected inputs
    if not user_input:
        return "I don't know what you're talking about."
    
    # More tailored response based on user input
    if personality == 'friendly':
        return f"Got it! {random.choice(responses['friendly'])}"
    elif personality == 'rude':
        return f"Yeah, yeah, I heard you. {random.choice(responses['rude'])}"
    elif personality == 'indifferent':
        return f"Sure, whatever. {random.choice(responses['indifferent'])}"

# Function to speak the response
def speak(text):
    tts = gTTS(text=text, lang='en')
    tts.save("response.mp3")
    
    # Play the audio based on OS type
    if os.name == 'posix' and 'darwin' in os.uname().sysname.lower():
        os.system("afplay response.mp3")  # macOS
    elif os.name == 'posix' and 'linux' in os.uname().sysname.lower():
        os.system("mpg123 response.mp3")  # Linux
    elif os.name == 'nt':
        os.system("start response.mp3")  # Windows

# Function to listen for user response
def listen_for_response():
    recognizer = sr.Recognizer()
    
    with sr.Microphone() as source:
        print("Listening for your response...")
        audio = recognizer.listen(source)
        
        try:
            response = recognizer.recognize_google(audio)
            print(f"You said: {response}")
            return response
        except sr.UnknownValueError:
            speak("Sorry, I didn't catch that. Can you repeat?")
            return None
        except sr.RequestError:
            speak("Sorry, there was an error with the speech service.")
            return None

# Function to monitor if the conversation pauses for too long
def is_timeout(start_time, timeout_duration=5):
    current_time = time.time()
    return current_time - start_time > timeout_duration

# Main conversation flow function
def conversation_flow():
    speak("Please dial a number.")
    
    # Capture rotary input (number dialed) through keyboard for local testing
    number = prompt_for_dial_number()

    # Map the dialed number to a specific personality
    if number == "1":
        personality = "friendly"
    elif number == "2":
        personality = "rude"
    elif number == "3":
        personality = "indifferent"
    
    # Generate initial response based on the selected personality
    initial_response = generate_response(personality, "")
    speak(initial_response)
    
    # AI asks what the user wants
    speak("What do you want?")
    user_input = listen_for_response()
    
    while user_input is None:
        user_input = listen_for_response()  # Keep listening until we get a response
    
    # AI responds to the user's input based on their tone/personality
    ai_response = generate_response(personality, user_input)
    speak(ai_response)
    
    # Start conversation loop with time-based pause detection
    conversation_in_progress = True
    start_time = time.time()
    
    while conversation_in_progress:
        user_input = listen_for_response()
        
        if user_input is None:
            continue
        
        ai_response = generate_response(personality, user_input)
        speak(ai_response)
        
        # Check if the user has stopped talking for a while (timeout threshold)
        if is_timeout(start_time):
            speak("Do you want to continue the conversation? Say 'yes' to keep talking or 'no' to end.")
            continue_conversation = listen_for_response()
            
            if continue_conversation and 'no' in continue_conversation.lower():
                speak("Goodbye!")
                conversation_in_progress = False  # End the conversation
            else:
                start_time = time.time()  # Reset timeout timer if the user says 'yes'
        else:
            start_time = time.time()  # Reset the timer if the conversation continues

# Start the conversation flow (local testing)
conversation_flow()

SUMMARY

This week I was focused on mastering basic voice recognition, so I wanted to solidify the code to work before I transfer it onto the Raspberry Pi.

 
 
 

留言


bottom of page