The Arousal model can be used to classify the speech from an audio snippet into high, neutral or low arousal.
- High arousal can be linked to intense emotions - happiness, anger, irritation, excitement.
- Low arousal is often associated with tired, depressed or unengaged speech.
The examples here address the combined use of Arousal model with the Speech model to find high energy speech segments in a vlog:
- basic analysis of audio files with built-in summarisation options - example 1
- custom summarisation options for audio file analysis - example 2
- Deeptone with license key and models
- audio file(s) you want to process
You can download this sample audio file with our CTO talking about OTO for the examples below.
Remember to add a valid license key before running the example.
In these examples we make use of the
transitions level outputs, calculated optionally when processing a file.
summary output presents us with the fraction of the audio which falls in a particular class. In the case below we are interested in the high arousal part of the speech, ignoring the audio with no speech detected. The idea is to find how engaged the speaker was while they were speaking.
transitions output present a useful concept of how to collect high-level information from an audio file. They operate on the most granular level of the output - 64ms in most models. As a result, even very small pauses between speech will be reflected in the output. Furthermore, when choosing the classification for a particular snippet, the class with the highest likelihood is chosen, but that may still be low in some difficult to classify cases. Depending on your use case you may be targeting a more custom summarisation.
In this second example, instead of analysing all timesteps with high and low voice arousal, we concentrate on the ones with confidence higher than 0.9. This may be useful if you are processing longer files and you are interested only in the most extreme voice expressions. You can easily rework the code sample to get the starting and ending timestamps of the high confidence snippets and in that way build an easy key moments detector.