Usage

Before the Deeptone SDK can be used, it must first be initialized. Before you do so, ensure you have your License Key. If you don’t, you can request it at support@oto.ai.

The Deeptone class

The Deeptone class is the entry point for all interactions with the Deeptone SDK. Once you instantiate it, you should reuse the same instance, as its initialisation process can be computationally heavy.

from deeptone import Deeptone

deeptone = Deeptone(license_key="YOUR_LICENSE_KEY")

print(deeptone.get_available_models())

Thread Safety Notice: Whereas instances of Deeptone can be shared across different threads, only one thread at a time should invoke the various processing methods. As such, if you intend on using this class in a multi-threaded application, you should either ensure each thread has its own Deeptone instance (usage of a pool is recommended) or you should guard all method invocations with a lock.

File Processing

The File Processing functionality allows you to extract insights from your audio files.

You can use it by invoking the process_file method:

deeptone.process_file(filename='my-file.wav',
                      models=['speech', 'arousal'],
                      output_period=1024,
                      include_summary=True,
                      include_transitions=True)

The following is a sample output returned by this method:

{
  "channels": {
    "0": {
      "time_series": [
        { "timestamp" : 0, "gender": { "result": "female", "confidence": 0.6418, } },
        { "timestamp" : 1024, "gender": { "result": "female", "confidence": 0.9002, } },
        { "timestamp" : 2048, "gender": { "result": "female", "confidence": 0.4725, } },
        { "timestamp" : 3072, "gender": { "result": "female", "confidence": 0.4679, } },
      ],
      "summary": {
        "gender": { "male_fraction": 0, "female_fraction": 0.8548, "unknown_fraction": 0.1452 },
      },
      "transitions": {
        "gender": [
          { "timestamp_start" : 0, "timestamp_end": 320, "result": "unknown", "confidence": 0.0151, },
          { "timestamp_start" : 320, "timestamp_end": 2880, "result": "female", "confidence": 0.8075, },
          { "timestamp_start" : 2880, "timestamp_end": 3136, "result": "unknown", "confidence": 0.0771, },
          { "timestamp_start" : 3136, "timestamp_end": 3968, "result": "female", "confidence": 0.4931, },
        ]
      }
    }
  }
}

By default, only the time_series key is returned, if you’re interested in the summary and transitions you need to set the include_summary and include_transitions flags to True, respectively.

You can find information on the output of each model in the Models page.

Refer to the File Processing section of the Deeptone documentation for more details on this functionality.

Real-Time Processing

The File Processing functionality allows you to extract real-time insights from an audio stream.

You can use it by invoking the process_stream method:

deeptone.process_stream(input_generator=my_generator,
                        models=['speech', 'arousal'],
                        output_period=1024)

The method receives an input_generator, which should be a Python Generator that periodically yields byte arrays containing raw audio data.

It returns a Generator that yields results for every output_period milliseconds of audio data, with the following format:

{
    "timestamp": 0,
    "results": {
        "gender": {
            "result": "female",
            "confidence": 0.6255,
        },
        "arousal": {
            "result": "high",
            "confidence": 0.9431,
        },
    },
}

You can find information on the output of each model in the Models page.

Refer to the Real Time Processing section of the Deeptone documentation for more details on this functionality.