Usage

Before the Deeptone SDK can be used, it must first be initialized. Before you do so, ensure you have your License Key and Model file. If you don’t, you can request them at support@oto.ai.

The Deeptone class

The Deeptone class is the entry point for all interactions with the Deeptone SDK. Once you instantiate it, you should reuse the same instance, as its initialisation process can be computationally heavy.

from deeptone import Deeptone

deeptone = Deeptone(license_key="YOUR_LICENSE_KEY", model_path="PATH_TO_YOUR_MODEL_FILE")

print(deeptone.get_available_models())

Thread Safety Notice: Deeptone instances are not thread-safe, and as such, if you intend to use it in a multi-threaded application, you should ensure that only a single thread is using an instance at a time (by either using an instance pool or guarding all method invocations with a lock).

File Processing

The File Processing functionality allows you to extract insights from your audio files.

You can use it by invoking the process_file method:

deeptone.process_file(filename='my-file.wav',
                      models=['speech', 'arousal'],
                      output_period=1024,
                      include_summary=True,
                      include_transitions=True)

The following is a sample output returned by this method:

{
  "channels": {
    "0": {
      "time_series": [
        { "timestamp" : 0, "gender": { "result": "female", "confidence": 0.6418, } },
        { "timestamp" : 1024, "gender": { "result": "female", "confidence": 0.9002, } },
        { "timestamp" : 2048, "gender": { "result": "female", "confidence": 0.4725, } },
        { "timestamp" : 3072, "gender": { "result": "female", "confidence": 0.4679, } },
      ],
      "summary": {
        "gender": { "male_fraction": 0, "female_fraction": 0.8548, "unknown_fraction": 0.1452 },
      },
      "transitions": {
        "gender": [
          { "timestamp_start" : 0, "timestamp_end": 320, "result": "unknown", "confidence": 0.0151, },
          { "timestamp_start" : 320, "timestamp_end": 2880, "result": "female", "confidence": 0.8075, },
          { "timestamp_start" : 2880, "timestamp_end": 3136, "result": "unknown", "confidence": 0.0771, },
          { "timestamp_start" : 3136, "timestamp_end": 3968, "result": "female", "confidence": 0.4931, },
        ]
      }
    }
  }
}

By default, only the time_series key is returned, if you’re interested in the summary and transitions you need to set the include_summary and include_transitions flags to True, respectively.

You can find information on the output of each model in the Models page.

Refer to the File Processing section of the Deeptone documentation for more details on this functionality.

Real-Time Processing

The File Processing functionality allows you to extract real-time insights from an audio stream.

You can use it by invoking the process_stream method:

deeptone.process_stream(input_generator=my_generator,
                        models=['speech', 'arousal'],
                        output_period=1024)

The method receives an input_generator, which should be a Python Generator that periodically yields byte arrays containing raw audio data.

It returns a Generator that yields results for every output_period milliseconds of audio data, with the following format:

{
    "timestamp": 0,
    "results": {
        "gender": {
            "value": "female",
            "confidence": 0.6255,
        },
        "arousal": {
            "value": "high",
            "confidence": 0.9431,
        },
    },
}

You can find information on the output of each model in the Models page.

Refer to the Real Time Processing section of the Deeptone documentation for more details on this functionality.