Gender Model Recipes
Overview
The gender model can be used to classify the speech in an audio snippet into female or male speech. If the confidence in the classification is too low, the result would be unknown
. The examples below cover the use cases:
- streaming from a microphone and reporting gender in real-time - toy app 1
- streaming from a microphone and reporting long monologues in real-time - toy app 2
If you are interested in file-processing code examples, rather than real-time, check out the speech or arousal recipes.
Recipes
Real-time analysis of male/female speech
Pre-requisites
- DeepTone
- pyaudio
- a microphone
Installing pyaudio in a python env may require some extra steps unless you are using Anaconda to manage your environment.
We still feel it's the easiest way to get your mic input in python though.
Installing pyaudio
If you already have pyaudio installed in your environment or an alternative package to stream audio from a microphone, go straight to the code.
- Mac
- Windows
On mac, you may have to install or overwrite portaudio, before installing pyaudio
brew install portaudio
then inside your virtualenv
pip install pyaudio
Reference: https://medium.com/@koji_kanao/when-cant-install-pyaudio-with-pip-190973840dbf
On Windows, you can install pyaudio with python from a wheel.
Download the wheel on this site https://www.lfd.uci.edu/~gohlke/pythonlibs/#pyaudio. Choose
PyAudio‑0.2.11‑cp37‑cp37m‑win32.whl
if you use 32 bit, orPyAudio‑0.2.11‑cp37‑cp37m‑win_amd64.whl
for 64 bit.Install from the wheel
pip install <path_to_wheel>
Reference: https://stackoverflow.com/a/54999645
Toy app 1 - report gender in real-time
Toy example to determine the gender characteristics of the speaker every ~1s in real-time. Other non-speech sounds are also detected (silence, background noise, music) and classified as no_speech
. Here we are using the gender model as an example, but any other model can be used in the same way.
Remember to add a valid license key before running the example.
from collections import deque
from deeptone import Deeptone
import pyaudio
# Set the required constants
VALID_LICENSE_KEY = None
OUTPUT_PERIOD_MS = 1024
CHUNK_SIZE = 1024
data_buffer = deque()
assert not None in (VALID_LICENSE_KEY), "Set the required constants"
# Initialise an audio stream
def writer_callback(in_data, frame_count, time_info, status):
data_buffer.extend(in_data)
return in_data, pyaudio.paContinue
pa = pyaudio.PyAudio()
stream = pa.open(
format=pyaudio.paInt16,
channels=1,
rate=16000,
input=True,
frames_per_buffer=CHUNK_SIZE,
stream_callback=writer_callback,
)
stream.start_stream()
def input_generator(buffer):
while stream.is_active():
while len(buffer) >= CHUNK_SIZE * 2:
samples_read = [buffer.popleft() for x in range(CHUNK_SIZE * 2)]
yield bytes(samples_read)
# Initialise Deeptone
engine = Deeptone(license_key=VALID_LICENSE_KEY)
audio_generator = input_generator(data_buffer)
print("Listening to you ...")
output = engine.process_stream(
input_generator=audio_generator,
models=[engine.models.Gender],
output_period=1024,
volume_threshold=0.005
)
try:
# Inspect the result
for ts_result in output:
ts = ts_result["timestamp"]
res = ts_result["results"]["gender"]
print(
f'Timestamp: {ts}ms\tresult: {res["result"]}'
f' with confidence {res["confidence"]}'
)
except KeyboardInterrupt:
print(f"Congrats! You processed {round((ts+1024)/1000)}s of audio with Deeptone.")
print("Goodbye!")
Toy app 2 - analyse monologues in real-time
Toy example to analyse a stream in real-time and warn if there are long monologues (people from the same gender speaking) or long silences. Here we are using the gender model as an example, but any other model can be used in the same way.
Remember to add a valid license key before running the example.
from deeptone import Deeptone
import time
import pyaudio
# Initialise an audio stream
pa = pyaudio.PyAudio()
stream = pa.open(format=pyaudio.paInt16, channels=1, rate=16000, input=True)
def input_generator():
data = stream.read(1024, exception_on_overflow=False)
while stream.is_active():
yield data
data = stream.read(1024, exception_on_overflow=False)
VALID_LICENSE_KEY = None
# Initialise Deeptone
engine = Deeptone(license_key=VALID_LICENSE_KEY)
audio_generator = input_generator()
print("Listening to you ...")
output = engine.process_stream(
input_generator=audio_generator,
models=[engine.models.Gender],
output_period=1024,
volume_threshold=0.005
)
try:
# Inspect the result
last_gender_to_speak = None
start_time_of_monologue = None
max_monologue_len = 5
for ts_result in output:
last_ts = ts_result["timestamp"]
gender = ts_result["results"]["gender"]["result"]
if gender != last_gender_to_speak:
# Someone new started speaking
start_time_of_monologue = time.time()
last_gender_to_speak = gender
elif gender != "no_speech":
# Long monologue, do something about it!
time_talked = time.time() - start_time_of_monologue
if time_talked > max_monologue_len:
print(
f"A {gender} person has been talking for {time_talked}s",
f"consider giving some else a word! ",
)
else:
# Long silence!
time_talked = time.time() - start_time_of_monologue
if time_talked > max_monologue_len:
print(f"No one was speaking for {time_talked}s, someone say something!")
except KeyboardInterrupt:
print(
f"Congrats! You processed {round((last_ts+1024)/1000)}s of audio with deeptone."
)
print("Goodbye!")