API Reference¶

class deeptone.Deeptone(license_key: str, prediction_engine=None)[source]¶

Entry point for the Deeptone SDK. Once this class is initialized, it provides access to the Deeptone Deep Learning models, which allow you to extract insights from your audio files.

Three processing modes are supported:

File Processing: This mode allows you to provide a file to Deeptone, which will provide a time series analysis, alongside a summary and list of transitions for the entire file.
Audio Bytes Processing: This mode allows you to provide audio bytes to Deeptone. The output will be the same as in the File Processing case.
Stream Processing: This mode allows you to provide a real-time audio stream, which results in a continuous analysis, which will periodically generate insights as the stream progresses.

Performance Considerations:: Initialization of the Deep Learning models that power Deeptone is a time-consuming operation. As such, the initialization process of this class can be costly, and thus, we recommend that instances be long-lived.
Thread Safety Considerations:: Instances of Deeptone are thread-safe. However, the actual inference process is done within a critical section, meaning that performance might be limited when using a single instance across multiple threads. If performance is a critical requirement, youshould either ensure each thread has its own Deeptone instance (usage of a pool is recommended).

Raises: LicenseManagerError – When the License Key is invalid or cannot be validated

get_available_models() → set[source]¶

Retrieve the name of all available models

Returns: The names of the available models
Return type: set

is_model_available(model_name: str) → bool[source]¶

Check if a model with the given name is available

Parameters: model_name (str) – Model name to validate
Returns: True if the model name provided is available
Return type: bool

process_audio_bytes(data: numpy.ndarray, models: list, output_period: int, include_summary: bool = False, include_transitions: bool = False, include_raw_values: bool = False, rate_in: int = None, use_chunking: bool = False, volume_threshold: float = 0.005) → dict[source]¶

Analyse audio data with the list of requested models.

This method can be used to generate timestamped predictions directly from audio bytes provided as a numpy array, rather than an audio file.

Parameters

data (np.ndarray) – Data to analyse
models (list) – List of models to use for the audio analysis
output_period (int) – How often (in milliseconds) the output of the model should be returned
include_summary (bool, optional) – Should the summary be included
include_transitions (bool, optional) – Should the file transitions be returned
include_raw_values (bool, optional) – Should raw model outputs be included
rate_in (int, optional) – Sample rate of the original audio (in Hz). Should only be specified if the rate differs from the recommended one (16000).
use_chunking (bool, optional) – Should data be chunked before making predictions. Chunking is only recommended in case of very large data arrays, to avoid memory issues.
volume_threshold (float, optional) – Threshold below which input data will be considered as no sound. Should be a number between 0 and 1, where 0 will treat all data as sound and 1 will treat all data as no sound.

Returns

A dictionary containing timestamped results and summary/transitions/raw values, if applicable.

If include_summary is set to True, the output will contain a summary for the entire data array.

If include_transitions is set to True, the transitions output groups the raw model output (1 prediction every 64 ms) into phases where the predicted classification remains the same.

If include_raw_values is set to True, all possible classes with their respective probabilities will be returned in the model in addition to the most likely one.

Example

{
    "time_series": [
        {
            "timestamp" : 100,
            "results": {
                "gender": {
                    "result": "female",
                    "confidence": 0.6255
                },
                "another_model": {
                    "result: <>,
                    "confidence": <confidence>
                },
            }
        },
        {
            "timestamp" : 105,
            "results:
            {
                "gender": {...},
                "another_model": {...}
            }
        }
    ]
}

Return type

dict

process_audio_chunk(data: numpy.ndarray, models: list, include_raw_values: bool = False, volume_threshold: float = 0.005, context_samples: int = 0) → dict[source]¶

Analyse an audio chunk with the list of requested models.

This method should be use when a single prediction is needed for the whole chunk. For reliable predictions the duration of the audio should be at least the size of the receptive field of the requested model (approximately 2s for most models). For more info on receptive fields, check Models

Parameters

data (np.ndarray) – Data to analyse, representing audio data sampled at 16kHz
models (list) – List of models to use for the audio analysis
include_raw_values (bool, optional) – Should raw model outputs be included
volume_threshold (float) – Threshold below which input data will be considered as no sound. Should be a number between 0 and 1, where 0 will treat all data as sound and 1 will treat all data as no sound.
context_samples (int) – Number of samples that are used as context (receptive field), the predictions for which should be removed from the final result. Defaults to 0, as not to remove anything.

Returns

A dictionary with the results from each model.

Refer to Models for details on the outputs for each individual model.

Example

{
    "results": {
        "gender": {
            "result": "female",
            "confidence": 0.6255
        ),
        "arousal": {
            "result": "high",
            "confidence": 0.9431
        )
    },
    "raw": {
        "gender": {
            "female": 0.8,
            "male": 0.2,
        },
        "arousal": {
            "high": 0.9245,
            "neutral": 0.0245,
            "low": 0.01
        },
    }
}

Return type

dict

Raises

ModelNotFoundError – if any of the models are invalid

process_file(filename: str, models: list, output_period: int, channel: Optional[int] = None, include_summary: bool = False, include_transitions: bool = False, include_raw_values: bool = False, use_chunking: bool = False, volume_threshold: float = 0.005) → dict[source]¶

Analyse a WAV File with the list of requested models.

Parameters

filename (str) – Path to the file to analyse
models (list) – List of models to use for the audio analysis
output_period (int) – How often (in milliseconds) the output of the models should be returned. The provided value must be a positive multiple of 64.
channel (int, optional) – The channel to analyse. If no channel is provided, all channels will be analysed
include_summary (bool, optional) – Should the file summary be returned
include_transitions (bool, optional) – Should the file transitions be returned
include_raw_values (bool, optional) – Should raw model outputs be included
use_chunking (bool, optional) – Should data be chunked before making predictions. Use this if the file being analyzed is large, to avoid issues with high memory consumption
volume_threshold (float, optional) – Threshold below which input data will be considered as no sound. Should be a number between 0 and 1, where 0 will treat all data as sound and 1 will treat all data as no sound. Defaults to 0.05 which should exclude very quiet fragments from analysis.

Returns

The results of the analysis for the request channels.

In each channel, a Time Series will be returned, containing the aggregated results for the specific time window.

If include_summary is set to True, the output will contain a summary for the entire file.

If include_transitions is set to True, the transitions output groups the raw model output (1 prediction every 64 ms) into phases where the predicted classification remains the same.

If include_raw_values is set to True, all possible classes with their respective probabilities will be returned in the model in addition to the most likely one.

Refer to Models for details on the outputs for each individual model.

Example

{
  "channels": {
    "0": {
      "time_series": [
        {
          "timestamp" : 0,
            "results": {
              "gender": {
                "result": "female",
                "confidence": 0.6255,
              },
              "arousal": {
                "result": "high",
                "confidence": 0.9245,
              },
            },
            "raw": {
              "gender": {
                "female": 0.8,
                "male": 0.2,
              },
              "arousal": {
                "high": 0.9245,
                "neutral": 0.0245,
                "low": 0.01
              },
            }
        },
      ],
      "summary": {
        "gender": {
          "high_fraction": 0.8451,
          "low_fraction": 0.1124,
          "neutral_fraction": 0.0425,
        },
        "arousal": {
          "high_fraction": 0.9451,
          "low_fraction": 0.0124,
          "neutral_fraction": 0.0425,
        }
      },
      "transitions": {
        "gender": [
          {
           "timestamp_start": 0,
           "timestamp_end": 1500,
           "result": "female",
           "confidence": 0.96
           },
          {
           "timestamp_start": 1500,
           "timestamp_end": 3420,
           "result": "male",
           "confidence": 0.87
          },
          ...
          {
           "timestamp_start": 8560,
           "timestamp_end": 10000,
           "result": "female",
           "confidence": 0.89
          }
        ],
        "arousal": [
          {
           "timestamp_start": 0,
           "timestamp_end": 2500,
           "result": "high",
           "confidence": 0.92
           },
          {
           "timestamp_start": 2500,
           "timestamp_end": 3420,
           "result": "low",
           "confidence": 0.85
          },
          ...
          {
           "timestamp_start": 7560,
           "timestamp_end": 10000,
           "result": "neutral",
           "confidence": 0.87
          }
        ]
      }
    }
  }
}

Return type

dict

process_stream(input_generator, models: list, output_period: int, include_raw_values: bool = False, volume_threshold: float = 0.005)[source]¶

Analyse a real-time audio stream with the list of requested models.

Parameters

input_generator (generator) – Generator that yields byte arrays representing audio data sampled at 16kHz
models (list) – List of models to use for the audio analysis
output_period (int) – How often in ms (milliseconds) the returned generator should yield results. The provided value must be a positive multiple of 64.
include_raw_values (bool, optional) – Should raw model outputs be included
volume_threshold (float) – Threshold below which input data will be considered as no sound. Should be a number between 0 and 1, where 0 will treat all data as sound and 1 will treat all data as no sound.

Returns

A generator that yields aggregated results for every output_period milliseconds of audio data received by the input_generator.

Refer to Models for details on the outputs for each individual model.

Example

{
    "timestamp": 0,
    "results": {
        "gender": {
            "result": "female",
            "confidence": 0.6255
        ),
        "arousal": {
            "result": "high",
            "confidence": 0.9431
        )
    },
    "raw": {
        "gender": {
            "female": 0.8,
            "male": 0.2,
        },
        "arousal": {
            "high": 0.9245,
            "neutral": 0.0245,
            "low": 0.01
        },
    }
}

Return type

generator

Raises

ModelNotFoundError – if any of the models are invalid
ValueError – if the output_period provided is invalid