AudioEvent Model Recipes
Overview
The AudioEvent model can be used to classify different types of human produced sounds.
In the example below, we demonstrate how to use the AudioEvent model to detect when someone is laughing. This model can be especially useful for content moderation, detection of bullying or other similar use-cases.
Pre-requisites
- Deeptone with license key
- Audio File(s) you want to process
Sample data
You can download this audio sample for the following examples.
Detect segments where someone is laughing - Example 1
Remember to add a valid license key before running the example.
In this example, you can use the transitions
level output, which is optionally calculated when processing a file,
to detect parts of the audio file where the speakers were using positive human sounds, in this case laughter.
from deeptone import Deeptone
from deeptone.deeptone import AUDIO_EVENT_POS_HUMAN_SOUNDS
# Set the required constants
VALID_LICENSE_KEY = None
FILE_TO_PROCESS = None
OUTPUT_PERIOD_MS = 1024
# Initialize DeepTone
engine = Deeptone(license_key=VALID_LICENSE_KEY)
output = engine.process_file(
filename=FILE_TO_PROCESS,
models=[engine.models.AudioEvent],
output_period=OUTPUT_PERIOD_MS,
include_transitions=True
)["channels"]["0"]["transitions"]["audio-event"]
for transition in output:
if transition["result"] == AUDIO_EVENT_POS_HUMAN_SOUNDS and transition["confidence"] > 0.75:
print(f'Potential laughter from {transition["timestamp_start"] / 1000}s to {transition["timestamp_end"] / 1000}s')
After executing the script using our example file, you should see the following output:
Potential laughter from 12.672s to 15.104s
Potential laughter from 29.568s to 30.464s
From these results, we can now listen to the mentioned segments of the audio and verify if those match moments where someone was laughing.