WEEK 5: BEGIN CODE
- Lauren Reyes
- Feb 11
- 4 min read
This post will talk about the beginning of the coding process, primarily executed on my local laptop.
LEARNING POINTS
Virtual environments
Dependencies & libraries
Understanding audio playback in python
SETTING UP DEVELOPMENT ENVIRONMENT
I installed Python through the VS Code terminal (though I thought it was already installed from previous projects).
I had a lot of trouble running code because of these two commands:
python --version
python3 --version
I realize now that I had to utilize python3 in order for my code to work.
INSTALLING REQUIRED LIBRARIES
When trying to install libraries and dependencies (ChatGPT recommended) for this project, the terminal prompted use of virtual environments. I had no idea what this meant at the time, but it became crucial in running the code properly.
Using pip:
gTTS: converts text to speech
speech_recognition: converts speech to text
pyaudio: allows program to listen to a microphone
This originally did not install at first, so I had to install "pipwin" as PyAudio is system-specific and depends on this.
RUNNING CODE
Ran this code:
python rotary_phone_test.py
It was not taking in any audio at first, so I included a print statement to see. I realized that it wasn't that it wasn't working, but it was only hearing noise instead of words.
Increased ambient noise adjustment
Increased listening time to account for timeout issue
Debugged Speech Matching Logic
Added print statement to see what I was saying
Kept saying my text was wrong despite saying the word correctly.
Fixed Audio Playback Issue
Did not originally have mpg321 installed, which is necessary for gTTS to play generated speech
I had to "brew" install this rather than sudo
This took a while to realize, as I could use sudo before, until I realized I had only used sudo while ssh'ed into another computer that was Linux based.
FINAL WORKING CODE
import speech_recognition as sr
from gtts import gTTS
import random
import os
import time
# Function to simulate dialing input (local test)
def prompt_for_dial_number():
print("Please dial a number by entering it on the keyboard (1-3):")
dial_number = input("Dial Number: ")
if dial_number not in ['1', '2', '3']:
print("Invalid number. Try again.")
return prompt_for_dial_number() # Restart if invalid input
print(f"Dialed Number: {dial_number}")
return dial_number
# Function to generate a response based on personality
def generate_response(personality, user_input):
responses = {
'friendly': [
"That's awesome! What else can I help with?",
"I'm so happy you called. What's on your mind?",
"You're in a good mood today, huh? What do you want to talk about?"
],
'rude': [
"Ugh, really? What do you want now?",
"I don't care. What else do you want?",
"You're wasting my time, but fine, what's up?"
],
'indifferent': [
"Yeah, sure, whatever. What now?",
"I'm here, just tell me what you need.",
"Do you even have a point? Spit it out."
]
}
# Default response for unexpected inputs
if not user_input:
return "I don't know what you're talking about."
# More tailored response based on user input
if personality == 'friendly':
return f"Got it! {random.choice(responses['friendly'])}"
elif personality == 'rude':
return f"Yeah, yeah, I heard you. {random.choice(responses['rude'])}"
elif personality == 'indifferent':
return f"Sure, whatever. {random.choice(responses['indifferent'])}"
# Function to speak the response
def speak(text):
tts = gTTS(text=text, lang='en')
tts.save("response.mp3")
# Play the audio based on OS type
if os.name == 'posix' and 'darwin' in os.uname().sysname.lower():
os.system("afplay response.mp3") # macOS
elif os.name == 'posix' and 'linux' in os.uname().sysname.lower():
os.system("mpg123 response.mp3") # Linux
elif os.name == 'nt':
os.system("start response.mp3") # Windows
# Function to listen for user response
def listen_for_response():
recognizer = sr.Recognizer()
with sr.Microphone() as source:
print("Listening for your response...")
audio = recognizer.listen(source)
try:
response = recognizer.recognize_google(audio)
print(f"You said: {response}")
return response
except sr.UnknownValueError:
speak("Sorry, I didn't catch that. Can you repeat?")
return None
except sr.RequestError:
speak("Sorry, there was an error with the speech service.")
return None
# Function to monitor if the conversation pauses for too long
def is_timeout(start_time, timeout_duration=5):
current_time = time.time()
return current_time - start_time > timeout_duration
# Main conversation flow function
def conversation_flow():
speak("Please dial a number.")
# Capture rotary input (number dialed) through keyboard for local testing
number = prompt_for_dial_number()
# Map the dialed number to a specific personality
if number == "1":
personality = "friendly"
elif number == "2":
personality = "rude"
elif number == "3":
personality = "indifferent"
# Generate initial response based on the selected personality
initial_response = generate_response(personality, "")
speak(initial_response)
# AI asks what the user wants
speak("What do you want?")
user_input = listen_for_response()
while user_input is None:
user_input = listen_for_response() # Keep listening until we get a response
# AI responds to the user's input based on their tone/personality
ai_response = generate_response(personality, user_input)
speak(ai_response)
# Start conversation loop with time-based pause detection
conversation_in_progress = True
start_time = time.time()
while conversation_in_progress:
user_input = listen_for_response()
if user_input is None:
continue
ai_response = generate_response(personality, user_input)
speak(ai_response)
# Check if the user has stopped talking for a while (timeout threshold)
if is_timeout(start_time):
speak("Do you want to continue the conversation? Say 'yes' to keep talking or 'no' to end.")
continue_conversation = listen_for_response()
if continue_conversation and 'no' in continue_conversation.lower():
speak("Goodbye!")
conversation_in_progress = False # End the conversation
else:
start_time = time.time() # Reset timeout timer if the user says 'yes'
else:
start_time = time.time() # Reset the timer if the conversation continues
# Start the conversation flow (local testing)
conversation_flow()
SUMMARY
This week I was focused on mastering basic voice recognition, so I wanted to solidify the code to work before I transfer it onto the Raspberry Pi.
留言