Supported Audio Formats

The DeepTone™ SDK currently supports PCM audio only. For file processing it currently supports the WAV file format. Ideally, the audio should be 16-bit PCM with the sample rate of 16 kHz. If a different sample rate is provided, the file will be up- or down-sampled accordingly. Please be aware though that using files with sample rates lower than the recommended 16kHz may lead to deterioration of analysis results.

Supported audio file formats

When processing files, the following file formats are supported*:

* Certain audio file formats can contain various audio codecs. Make sure the audio coding format of the file you are processing is supported as well. For example: A WAV file can contain PCM A-law or PCM mu-law audio. However, the DeepTone™ SDK does not support these audio coding formats, yet.

Supported audio coding formats

Most PCM audio coding formats are supported. When processing audio data directly (not file processing) by passing a numpy array, the numpy array needs to have one of the following data types. Additionally, the sample rate needs to be specified in the rate_in argument.

* signed 8-bit little-endian integer
* signed 16-bit little-endian integer
* signed 32-bit little-endian integer
* 32-bit floating-point little-endian
* 64-bit floating-point little-endian
* unsigned 8-bit little-endian integer
* unsigned 16-bit little-endian integer
* unsigned 32-bit little-endian integer

Check audio format

If you're not sure your audio files meet these criteria you can use the CLI tool SoX for that verification by doing the following:

sox --i PATH_TO_YOUR_AUDIO_FILE

The result will be something similar to:

Input File : PATH_TO_YOUR_AUDIO_FILE
Channels : 1
Sample Rate : 16000
Precision : 16-bit
Duration : 00:00:03.99 = 63840 samples ~ 299.25 CDDA sectors
File Size : 128k
Bit Rate : 256k
Sample Encoding: 16-bit Signed Integer PCM

SoX also allows you to convert your files in case they don't match our criteria by using the following command:

sox PATH_TO_YOUR_AUDIO_FILE -b 16 PATH_TO_OUTPUT_FILE rate 16k

Alternatively, you can also use ffmpeg to convert your audio to the right format:

ffmpeg -i PATH_TO_YOUR_AUDIO_FILE -acodec pcm_s16le -ar 16000 PATH_TO_OUTPUT_FILE.wav