DeepTone™'s File Processing functionality allows you to extract insights from your audio files.
Working with stereo files
DeepTone™ processes each audio channel separately. If you provide a stereo file, you can provide a specific channel to be processed, otherwise, all channels will be processed separately.
Configuration options and outputs
There are different configuration options and types of outputs which can be used depending on the SDK language.
Available configuration options
There are several possible arguments which can be passed to the
filename- the path to the file to be analysed. Supported Audio Formats
models- the list of model names to use for the audio analysis. See all available models here.
output_period- how often (in milliseconds, multiple of 64) the output of the models should be returned
channel- optionally a channel to analyse, otherwise all channels will be analysed
include_summary- optionally if the output should contain of summary of the analysis, defaults to False
include_transitions- optionally if the output should contain transitions of the analysis, defaults to False
include_raw_values- optionally if the result should contain raw model outputs, defaults to False
use_chunking- optionally if the data should be chunked before making the analysis (recommended for large files to avoid memory issues)
volume_threshold- optionally if a volume level different than default should be considered (higher values will result in more of the data being treated as silence)
voice_signatures- optionally (only applies when using the SpeakerMap model) the voice signatures that are used to identify known speakers
include_voice_signatures- optionally (only applies when using the SpeakerMap model) if the result should contain the voice signatures of the found speakers. If a
voice_signaturesobject was provided as well, it will update the
voice_signaturesobject with the voice signatures of new speakers. See Create Voice Signatures to find out more about voice signatures
There are three possible output types, depending on the parameters that you pass to the
- a plain time series - default output type, returned always
- a plain time series with raw model outputs - raw values are appended when
- a summary - appended to the results when
- a simplified time series - appended to the results when
- voice signatures of speakers - appended to the results when
include_voice_signatures=True(only applies when using the SpeakerMap model)
See below for examples of each of the three outputs:
- plain time series (according to the specified
- plain time series with additional raw outputs:
- summary (showing fraction of each class across the entire file):
- simplified time series (indicating transition points between alternating results):
- voice signatures of the detected speakers:
You can use the
process_file method to process your audio files.
The returned object contains the time series with an analysis of the file broken down by the provided output period:
The output of the script would be something like:
For more example usage of the
transitions, head to the Speech detection recipes and the Arousal detection recipes sections. For example usage of `raw` output to implement custom speech thresholds, head to Example 3 in Speech model recipes.