File Processing

DeepTone™'s File Processing functionality allows you to extract insights from your audio files.

There are two ways to provide your audio files to the API.

  1. Send the URL of the wav file you would like to process in the post request. The DeepTone™ Cloud API will then download the file and process it
  2. Upload the content of a local wav file directly with the post request.

For code sample go to Example Usage

Working with stereo files

DeepTone™ processes each audio channel separately. If you provide a stereo file, you can provide a specific channel to be processed, otherwise, all channels will be processed separately.

Sample data

You can download this sample audio file with a woman speaking for the examples below. For code sample go to Example Usage.

Supported formats

Currently, processing WAV files is supported. Ideally, the files should be 16-bit PCM with the sample rate of 16 kHz. If a different sample rate is provided, the file will be up- or down-sampled accordingly. Please be aware though that using files with sample rates lower than recommended may lead to deterioration of analysis results.

If you're not sure your audio files meet these criteria you can use the CLI tool SoX for that verification by doing the following:

sox --i PATH_TO_YOUR_AUDIO_FILE

The result will be something similar to:

Input File : PATH_TO_YOUR_AUDIO_FILE
Channels : 1
Sample Rate : 16000
Precision : 16-bit
Duration : 00:00:03.99 = 63840 samples ~ 299.25 CDDA sectors
File Size : 128k
Bit Rate : 256k
Sample Encoding: 16-bit Signed Integer PCM

SoX also allows you to convert your files in case they don't match our criteria by using the following command:

sox PATH_TO_YOUR_AUDIO_FILE -b 16 PATH_TO_OUTPUT_FILE rate 16k

File size limitations

Currently, the maximum allowed size for a file on the Cloud API is 15MB. We are actively working on expanding the supported file types and formats. Furthermore, this is not a limitation on on-premise deployments of the DeepTone™ API. If you would like to process files larger than 15MB with the Cloud API you can:

  • use the streaming API to handle arbitrarily long audio streams - see Example usage in the Real-time processing section
  • split the file into smaller chunks, up to 15MB each ( approximately 7min of PCM single-channel audio); send multiple requests and concatenate the results

To split the file, you could use SoX again:

sox <INPUT_FILE> -b 16 file_out.wav rate 16k remix 1 trim 0 420 : newfile : restart

Configuration options and outputs

There are different configuration parameters and types of outputs which can be requested.

For code sample go to Example Usage. For detailed output specification go to Output Specification.

Available configuration parameters

There are several possible parameters which can be passed to a post request to the /jobs endpoint:

  • models - the list of model names to use for the audio analysis
  • output_period - how often (in milliseconds, multiple of 64) the output of the models should be returned
  • channel - optionally a channel to analyse, otherwise all channels will be analysed
  • include_summary - optionally if the output should contain of summary of the analysis, defaults to False
  • include_transitions - optionally if the output should contain transitions of the analysis, defaults to False
  • include_raw_values - optionally if the result should contain raw model outputs, defaults to False
  • volume_threshold - optionally if a volume level different than default should be considered (higher values will result in more of the data being treated as silence)
  • callback - optionally a callback URL that will be invoked once the results are ready. More info about this option here.

Available Outputs

There are three possible output types, depending on the parameters that are set to true on the request:

  • a plain time series - default output type, returned always
  • a plain time series with raw model outputs - raw values are appended when include_raw_values=true
  • a summary - appended to the results when include_summary=true
  • a simplified time series - appended to the results when include_transitions=true

For code sample go to Example Usage. For detailed output specification go to Output Specification.

See below for examples of each of the three outputs:

  • plain time series (according to the specified output_period):
{
"channels": {
"0": {
"time_series": [
{
"timestamp": 0,
"results": {
"gender": {
"result": "male",
"confidence": 0.92
}
}
},
{
"timestamp": 1024,
"results": {
"gender": {
"result": "male",
"confidence": 0.86
}
}
},
{
"timestamp": 2048,
"results": {
"gender": {
"result": "male",
"confidence": 0.85
}
}
},
...
{
"timestamp": 29696,
"results":{
"gender": {
"result": "silence",
"confidence": 1.0
}
}
}
]
}
}
}
  • plain time series with additional raw outputs:
{
"channels": {
"0": {
"time_series": [
{
"timestamp": 0,
"results": {
"gender": {
"result": "male",
"confidence": 0.92
}
},
"raw": {
"gender": {
"male": 0.92,
"female": 0.08
}
}
},
{
"timestamp": 1024,
"results": {
"gender": {
"result": "male",
"confidence": 0.86
}
},
"raw": {
"gender": {
"male": 0.86,
"female": 0.14
}
}
},
{
"timestamp": 2048,
"results":{
"gender": {
"result": "male",
"confidence": 0.85
}
},
"raw": {
"gender": {
"male": 0.85,
"female": 0.15
}
}
},
...
{
"timestamp": 29696,
"results": {
"gender": {
"result": "silence",
"confidence": 1.0
}
},
"raw": {
"gender": {
"male": 0.12,
"female": 0.88
}
}
}
]
}
}
}
  • summary (showing fraction of each class across the entire file):
{
"channels": {
"0": {
"time_series": [ ... ],
"summary": {
"gender": {
"male_fraction": 0.7451,
"female_fraction": 0.1024,
"other_fraction": 0.112,
"unknown_fraction": 0.0405,
"silence_fraction": 0.0,
},
}
}
}
}
  • simplified time series (indicating transition points between alternating results):
{
"channels": {
"0": {
"time_series": [ ... ],
"transitions": {
"gender": [
{
"timestamp_start": 0,
"timestamp_end": 1024,
"result": "female",
"confidence": 0.96
},
{
"timestamp_start": 1024,
"timestamp_end": 3072,
"result": "male",
"confidence": 0.87
},
...
{
"timestamp_start": 8192,
"timestamp_end": 12288,
"result": "female",
"confidence": 0.89
}
],
}
}
}
}

Callbacks

The callback parameter expects a valid URL that will be invoked when your job finishes processing. Once the results are ready the API will invoke this endpoint using POST and with a body that matches the one returned by the GET request to the /file-processing/jobs/{jobId} endpoint.

If the invocation to the callback is successful the API expects a 2XX status code. In case of an unsuccessful invocation (5XX status code) the API will retry invoking the endpoint up to 3 times. This retry mechanism means that there's the possiblity that the callback endpoint might receive multiple notifications, which should be handled by the user. Any other status codes will not trigger the retry mechanism. The callback endpoint provided should respond within 10 seconds when being invoked, otherwise the request will timeout.

Example Usage

To process a file that is available on this url: https://docs.oto.ai/api/audio/sample_audio.wav use the following curl commands:

Create job:

curl --request POST \
--url 'https://api.oto.ai/file-processing/jobs?models=speech,arousal&output_period=4096&channel=0&include_summary=true&include_transitions=true&include_raw_values=true&volume_threshold=0.0' \
--header 'content-type: application/json' \
--header 'x-api-key: REPLACE_KEY_VALUE' \
--data '{"url":"https://docs.oto.ai/api/audio/audio_sample.wav"}'

Get results:

curl --request GET \
--url https://api.oto.ai/file-processing/jobs/REPLACE_JOB_ID/results \
--header 'x-api-key: REPLACE_KEY_VALUE'

To process a local file use the following curl commands:

Create job:

curl --request POST \
--url 'https://api.oto.ai/file-processing/jobs?models=speech,arousal&output_period=4096&channel=0&include_summary=true&include_transitions=true&include_raw_values=true&volume_threshold=0.0' \
--header 'content-type: audio/wav' \
--header 'x-api-key: REPLACE_KEY_VALUE' \
--data-binary @path/to/local/file.wav

Get results:

curl --request GET \
--url https://api.oto.ai/file-processing/jobs/REPLACE_JOB_ID/results \
--header 'x-api-key: REPLACE_KEY_VALUE'