API Reference

class deeptone.Deeptone(license_key: str, prediction_engine=None)[source]

Entry point for the Deeptone SDK. Once this class is initialized, it provides access to the Deeptone Deep Learning models, which allow you to extract insights from your audio files.

Three processing modes are supported:

  • File Processing: This mode allows you to provide a file to Deeptone, which will provide a time series analysis, alongside a summary and list of transitions for the entire file.

  • Audio Bytes Processing: This mode allows you to provide audio bytes to Deeptone. The output will be the same as in the File Processing case.

  • Stream Processing: This mode allows you to provide a real-time audio stream, which results in a continuous analysis, which will periodically generate insights as the stream progresses.

Performance Considerations:

Initialization of the Deep Learning models that power Deeptone is a time-consuming operation. As such, the initialization process of this class can be costly, and thus, we recommend that instances be long-lived.

Thread Safety Considerations:

Instances of Deeptone are thread-safe. However, the actual inference process is done within a critical section, meaning that performance might be limited when using a single instance across multiple threads. If performance is a critical requirement, youshould either ensure each thread has its own Deeptone instance (usage of a pool is recommended).

Raises

LicenseManagerError – When the License Key is invalid or cannot be validated

get_available_models() → set[source]

Retrieve the name of all available models

Returns

The names of the available models

Return type

set

is_model_available(model_name: str) → bool[source]

Check if a model with the given name is available

Parameters

model_name (str) – Model name to validate

Returns

True if the model name provided is available

Return type

bool

process_audio_bytes(data: numpy.ndarray, models: list, output_period: int, include_summary: bool = False, include_transitions: bool = False, include_raw_values: bool = False, rate_in: int = None, use_chunking: bool = False, volume_threshold: float = 0.005) → dict[source]

Analyse audio data with the list of requested models.

This method can be used to generate timestamped predictions directly from audio bytes provided as a numpy array, rather than an audio file.

Parameters
  • data (np.ndarray) – Data to analyse

  • models (list) – List of models to use for the audio analysis

  • output_period (int) – How often (in milliseconds) the output of the model should be returned

  • include_summary (bool, optional) – Should the summary be included

  • include_transitions (bool, optional) – Should the file transitions be returned

  • include_raw_values (bool, optional) – Should raw model outputs be included

  • rate_in (int, optional) – Sample rate of the original audio (in Hz). Should only be specified if the rate differs from the recommended one (16000).

  • use_chunking (bool, optional) – Should data be chunked before making predictions. Chunking is only recommended in case of very large data arrays, to avoid memory issues.

  • volume_threshold (float, optional) – Threshold below which input data will be considered as no sound. Should be a number between 0 and 1, where 0 will treat all data as sound and 1 will treat all data as no sound.

Returns

A dictionary containing timestamped results and summary/transitions/raw values, if applicable.

If include_summary is set to True, the output will contain a summary for the entire data array.

If include_transitions is set to True, the transitions output groups the raw model output (1 prediction every 64 ms) into phases where the predicted classification remains the same.

If include_raw_values is set to True, all possible classes with their respective probabilities will be returned in the model in addition to the most likely one.

Example

{
    "time_series": [
        {
            "timestamp" : 100,
            "results": {
                "gender": {
                    "result": "female",
                    "confidence": 0.6255
                },
                "another_model": {
                    "result: <>,
                    "confidence": <confidence>
                },
            }
        },
        {
            "timestamp" : 105,
            "results:
            {
                "gender": {...},
                "another_model": {...}
            }
        }
    ]
}

Return type

dict

process_audio_chunk(data: numpy.ndarray, models: list, include_raw_values: bool = False, volume_threshold: float = 0.005, context_samples: int = 0) → dict[source]

Analyse an audio chunk with the list of requested models.

This method should be use when a single prediction is needed for the whole chunk. For reliable predictions the duration of the audio should be at least the size of the receptive field of the requested model (approximately 2s for most models). For more info on receptive fields, check Models

Parameters
  • data (np.ndarray) – Data to analyse, representing audio data sampled at 16kHz

  • models (list) – List of models to use for the audio analysis

  • include_raw_values (bool, optional) – Should raw model outputs be included

  • volume_threshold (float) – Threshold below which input data will be considered as no sound. Should be a number between 0 and 1, where 0 will treat all data as sound and 1 will treat all data as no sound.

  • context_samples (int) – Number of samples that are used as context (receptive field), the predictions for which should be removed from the final result. Defaults to 0, as not to remove anything.

Returns

A dictionary with the results from each model.

Refer to Models for details on the outputs for each individual model.

Example

{
    "results": {
        "gender": {
            "result": "female",
            "confidence": 0.6255
        ),
        "arousal": {
            "result": "high",
            "confidence": 0.9431
        )
    },
    "raw": {
        "gender": {
            "female": 0.8,
            "male": 0.2,
        },
        "arousal": {
            "high": 0.9245,
            "neutral": 0.0245,
            "low": 0.01
        },
    }
}

Return type

dict

Raises

ModelNotFoundError – if any of the models are invalid

process_file(filename: str, models: list, output_period: int, channel: Optional[int] = None, include_summary: bool = False, include_transitions: bool = False, include_raw_values: bool = False, use_chunking: bool = False, volume_threshold: float = 0.005) → dict[source]

Analyse a WAV File with the list of requested models.

Parameters
  • filename (str) – Path to the file to analyse

  • models (list) – List of models to use for the audio analysis

  • output_period (int) – How often (in milliseconds) the output of the models should be returned. The provided value must be a positive multiple of 64.

  • channel (int, optional) – The channel to analyse. If no channel is provided, all channels will be analysed

  • include_summary (bool, optional) – Should the file summary be returned

  • include_transitions (bool, optional) – Should the file transitions be returned

  • include_raw_values (bool, optional) – Should raw model outputs be included

  • use_chunking (bool, optional) – Should data be chunked before making predictions. Use this if the file being analyzed is large, to avoid issues with high memory consumption

  • volume_threshold (float, optional) – Threshold below which input data will be considered as no sound. Should be a number between 0 and 1, where 0 will treat all data as sound and 1 will treat all data as no sound. Defaults to 0.05 which should exclude very quiet fragments from analysis.

Returns

The results of the analysis for the request channels.

In each channel, a Time Series will be returned, containing the aggregated results for the specific time window.

If include_summary is set to True, the output will contain a summary for the entire file.

If include_transitions is set to True, the transitions output groups the raw model output (1 prediction every 64 ms) into phases where the predicted classification remains the same.

If include_raw_values is set to True, all possible classes with their respective probabilities will be returned in the model in addition to the most likely one.

Refer to Models for details on the outputs for each individual model.

Example

{
  "channels": {
    "0": {
      "time_series": [
        {
          "timestamp" : 0,
            "results": {
              "gender": {
                "result": "female",
                "confidence": 0.6255,
              },
              "arousal": {
                "result": "high",
                "confidence": 0.9245,
              },
            },
            "raw": {
              "gender": {
                "female": 0.8,
                "male": 0.2,
              },
              "arousal": {
                "high": 0.9245,
                "neutral": 0.0245,
                "low": 0.01
              },
            }
        },
      ],
      "summary": {
        "gender": {
          "high_fraction": 0.8451,
          "low_fraction": 0.1124,
          "neutral_fraction": 0.0425,
        },
        "arousal": {
          "high_fraction": 0.9451,
          "low_fraction": 0.0124,
          "neutral_fraction": 0.0425,
        }
      },
      "transitions": {
        "gender": [
          {
           "timestamp_start": 0,
           "timestamp_end": 1500,
           "result": "female",
           "confidence": 0.96
           },
          {
           "timestamp_start": 1500,
           "timestamp_end": 3420,
           "result": "male",
           "confidence": 0.87
          },
          ...
          {
           "timestamp_start": 8560,
           "timestamp_end": 10000,
           "result": "female",
           "confidence": 0.89
          }
        ],
        "arousal": [
          {
           "timestamp_start": 0,
           "timestamp_end": 2500,
           "result": "high",
           "confidence": 0.92
           },
          {
           "timestamp_start": 2500,
           "timestamp_end": 3420,
           "result": "low",
           "confidence": 0.85
          },
          ...
          {
           "timestamp_start": 7560,
           "timestamp_end": 10000,
           "result": "neutral",
           "confidence": 0.87
          }
        ]
      }
    }
  }
}

Return type

dict

process_stream(input_generator, models: list, output_period: int, include_raw_values: bool = False, volume_threshold: float = 0.005)[source]

Analyse a real-time audio stream with the list of requested models.

Parameters
  • input_generator (generator) – Generator that yields byte arrays representing audio data sampled at 16kHz

  • models (list) – List of models to use for the audio analysis

  • output_period (int) – How often in ms (milliseconds) the returned generator should yield results. The provided value must be a positive multiple of 64.

  • include_raw_values (bool, optional) – Should raw model outputs be included

  • volume_threshold (float) – Threshold below which input data will be considered as no sound. Should be a number between 0 and 1, where 0 will treat all data as sound and 1 will treat all data as no sound.

Returns

A generator that yields aggregated results for every output_period milliseconds of audio data received by the input_generator.

Refer to Models for details on the outputs for each individual model.

Example

{
    "timestamp": 0,
    "results": {
        "gender": {
            "result": "female",
            "confidence": 0.6255
        ),
        "arousal": {
            "result": "high",
            "confidence": 0.9431
        )
    },
    "raw": {
        "gender": {
            "female": 0.8,
            "male": 0.2,
        },
        "arousal": {
            "high": 0.9245,
            "neutral": 0.0245,
            "low": 0.01
        },
    }
}

Return type

generator

Raises
  • ModelNotFoundError – if any of the models are invalid

  • ValueError – if the output_period provided is invalid