The Emotions model can classify speaker's emotion into "happy", "irritated", "neutral" or "tired".
Because it only makes sense to apply this model to speech audio, it is combined with the SpeechRT and Volume models to increase the reliability of the results.
The receptive field of this model is 2107 milliseconds.
|Receptive Field||Result Type|
|2107 ms||result ∈ ["happy", "irritated", "neutral", "tired", "no_speech", "silence"]|
The time-series result will be an iterable with elements that contain the following information:
Time-series with raw values
If raw values were requested, they will be added to the time-series result:
In case a summary is requested the following will be returned
where x_fraction represents the percentage of time that x class was identified for the duration of the input.
In case the transitions are requested a time-series with transition elements like shown below will be returned
The example above means that the emotion detected within first 1500ms of the audio snippet was neutral, and between 1500ms and 4000ms DeepTone™ it was happy.