Usage¶
Before the Deeptone SDK can be used, it must first be initialized. Before you do so, ensure you have your License Key. If you don’t, you can request it at support@oto.ai.
The Deeptone
class¶
The Deeptone
class is the entry point for all interactions with the Deeptone SDK. Once you instantiate it, you should
reuse the same instance, as its initialisation process can be computationally heavy.
from deeptone import Deeptone
deeptone = Deeptone(license_key="YOUR_LICENSE_KEY")
print(deeptone.get_available_models())
Thread Safety Notice: Whereas instances of Deeptone can be shared across different threads, only one thread at a time should invoke the various processing methods. As such, if you intend on using this class in a multi-threaded application, you should either ensure each thread has its own Deeptone instance (usage of a pool is recommended) or you should guard all method invocations with a lock.
File Processing¶
The File Processing functionality allows you to extract insights from your audio files.
You can use it by invoking the process_file
method:
deeptone.process_file(filename='my-file.wav',
models=['speech', 'arousal'],
output_period=1024,
include_summary=True,
include_transitions=True)
The following is a sample output returned by this method:
{
"channels": {
"0": {
"time_series": [
{ "timestamp" : 0, "gender": { "result": "female", "confidence": 0.6418, } },
{ "timestamp" : 1024, "gender": { "result": "female", "confidence": 0.9002, } },
{ "timestamp" : 2048, "gender": { "result": "female", "confidence": 0.4725, } },
{ "timestamp" : 3072, "gender": { "result": "female", "confidence": 0.4679, } },
],
"summary": {
"gender": { "male_fraction": 0, "female_fraction": 0.8548, "unknown_fraction": 0.1452 },
},
"transitions": {
"gender": [
{ "timestamp_start" : 0, "timestamp_end": 320, "result": "unknown", "confidence": 0.0151, },
{ "timestamp_start" : 320, "timestamp_end": 2880, "result": "female", "confidence": 0.8075, },
{ "timestamp_start" : 2880, "timestamp_end": 3136, "result": "unknown", "confidence": 0.0771, },
{ "timestamp_start" : 3136, "timestamp_end": 3968, "result": "female", "confidence": 0.4931, },
]
}
}
}
}
By default, only the time_series
key is returned, if you’re interested in the summary
and transitions
you need to
set the include_summary
and include_transitions
flags to True
, respectively.
You can find information on the output of each model in the Models page.
Refer to the File Processing section of the Deeptone documentation for more details on this functionality.
Real-Time Processing¶
The File Processing functionality allows you to extract real-time insights from an audio stream.
You can use it by invoking the process_stream
method:
deeptone.process_stream(input_generator=my_generator,
models=['speech', 'arousal'],
output_period=1024)
The method receives an input_generator,
which should be a Python Generator
that periodically yields byte arrays containing raw audio data.
It returns a Generator that yields results for every output_period
milliseconds of audio data, with the following
format:
{
"timestamp": 0,
"results": {
"gender": {
"result": "female",
"confidence": 0.6255,
},
"arousal": {
"result": "high",
"confidence": 0.9431,
},
},
}
You can find information on the output of each model in the Models page.
Refer to the Real Time Processing section of the Deeptone documentation for more details on this functionality.