Skip to main content

Gender Model Recipes

Overview

The gender model can be used to classify the speech in an audio snippet into female or male speech. If the confidence in the classification is too low, the result would be unknown. The examples below cover the use cases:

  • streaming from a microphone and reporting gender in real-time - toy app 1
  • streaming from a microphone and reporting long monologues in real-time - toy app 2

If you are interested in file-processing code examples, rather than real-time, check out the speech or arousal recipes.

Recipes

Real-time analysis of male/female speech

Pre-requisites

  • DeepTone
  • pyaudio
  • a microphone
pyaudio

Installing pyaudio in a python env may require some extra steps unless you are using Anaconda to manage your environment.

We still feel it's the easiest way to get your mic input in python though.

Installing pyaudio

If you already have pyaudio installed in your environment or an alternative package to stream audio from a microphone, go straight to the code.

On mac, you may have to install or overwrite portaudio, before installing pyaudio

brew install portaudio

then inside your virtualenv

pip install pyaudio

Reference: https://medium.com/@koji_kanao/when-cant-install-pyaudio-with-pip-190973840dbf

Toy app 1 - report gender in real-time

Toy example to determine the gender characteristics of the speaker every ~1s in real-time. Other non-speech sounds are also detected (silence, background noise, music) and classified as no_speech. Here we are using the gender model as an example, but any other model can be used in the same way.

Remember to add a valid license key before running the example.

from collections import deque
from deeptone import Deeptone
import pyaudio

# Set the required constants
VALID_LICENSE_KEY = None

OUTPUT_PERIOD_MS = 1024
CHUNK_SIZE = 1024
data_buffer = deque()

assert not None in (VALID_LICENSE_KEY), "Set the required constants"

# Initialise an audio stream
def writer_callback(in_data, frame_count, time_info, status):
data_buffer.extend(in_data)
return in_data, pyaudio.paContinue

pa = pyaudio.PyAudio()
stream = pa.open(
format=pyaudio.paInt16,
channels=1,
rate=16000,
input=True,
frames_per_buffer=CHUNK_SIZE,
stream_callback=writer_callback,
)
stream.start_stream()


def input_generator(buffer):
while stream.is_active():
while len(buffer) >= CHUNK_SIZE * 2:
samples_read = [buffer.popleft() for x in range(CHUNK_SIZE * 2)]
yield bytes(samples_read)


# Initialise Deeptone
engine = Deeptone(license_key=VALID_LICENSE_KEY)

audio_generator = input_generator(data_buffer)

print("Listening to you ...")
output = engine.process_stream(
input_generator=audio_generator,
models=[engine.models.Gender],
output_period=1024,
volume_threshold=0.005
)

try:
# Inspect the result
for ts_result in output:
ts = ts_result["timestamp"]
res = ts_result["results"]["gender"]
print(
f'Timestamp: {ts}ms\tresult: {res["result"]}'
f' with confidence {res["confidence"]}'
)

except KeyboardInterrupt:
print(f"Congrats! You processed {round((ts+1024)/1000)}s of audio with Deeptone.")
print("Goodbye!")

Toy app 2 - analyse monologues in real-time

Toy example to analyse a stream in real-time and warn if there are long monologues (people from the same gender speaking) or long silences. Here we are using the gender model as an example, but any other model can be used in the same way.

Remember to add a valid license key before running the example.

from deeptone import Deeptone
import time
import pyaudio

# Initialise an audio stream
pa = pyaudio.PyAudio()
stream = pa.open(format=pyaudio.paInt16, channels=1, rate=16000, input=True)


def input_generator():
data = stream.read(1024, exception_on_overflow=False)
while stream.is_active():
yield data
data = stream.read(1024, exception_on_overflow=False)


VALID_LICENSE_KEY = None

# Initialise Deeptone
engine = Deeptone(license_key=VALID_LICENSE_KEY)

audio_generator = input_generator()

print("Listening to you ...")
output = engine.process_stream(
input_generator=audio_generator,
models=[engine.models.Gender],
output_period=1024,
volume_threshold=0.005
)

try:
# Inspect the result
last_gender_to_speak = None
start_time_of_monologue = None
max_monologue_len = 5

for ts_result in output:
last_ts = ts_result["timestamp"]
gender = ts_result["results"]["gender"]["result"]
if gender != last_gender_to_speak:
# Someone new started speaking
start_time_of_monologue = time.time()
last_gender_to_speak = gender

elif gender != "no_speech":
# Long monologue, do something about it!
time_talked = time.time() - start_time_of_monologue
if time_talked > max_monologue_len:
print(
f"A {gender} person has been talking for {time_talked}s",
f"consider giving some else a word! ",
)
else:
# Long silence!
time_talked = time.time() - start_time_of_monologue
if time_talked > max_monologue_len:
print(f"No one was speaking for {time_talked}s, someone say something!")

except KeyboardInterrupt:
print(
f"Congrats! You processed {round((last_ts+1024)/1000)}s of audio with deeptone."
)
print("Goodbye!")