Changelog¶

0.9.0¶

The model bundle format was improved to make it easier to integrate new models. This change is not backwards compatible, so users that are updating from an older version will need to use the new license-key that was provided via email. Alternatively, it is possible to request that the current license-key is updated to the new model bundle format such that no code changes are required.
The DeepTone SDK can no longer be used together with the tensorflow module. For more information visit https://docs.oto.ai/sdk/troubleshooting#deeptone-sdk-and-tensorflow

Fixed bug in the audio resampling function. This bug only affected users that processed audio with a sampling rate other than 16000 Hz.

Remove tensorflow python module dependency which makes the DeepTone SDK package slimmer
Add language model support Early access on request
Add speaker-map model support for file processing Alpha version - access on request
Add python 3.7+ support
Overall performance improvements

Fixed conversion of input data of type int16 to float32, such that the results are consistent for models with built-in normalization and models without normalization.

The SpeechRT model is now the default VAD model
Performance improvements when using more than one model
Use optimized model bundle format to increase load and inference performance
Comes with improved versions of the Speech, SpeechRT, Arousal and Gender models
New default value for the volume threshold (0.005). This default value is more robust in common scenarios.

Fixed bug in stream processing to account correctly for receptive field of the model
When use_chunking=True, the chunking method is actually used

Add SpeechRT model for low-latency speech predictions (decision latency <100ms)
Add new methods of processing - process_audio_bytes, process_audio_chunk - more suitable for analysing byte numpy arrays directly
Make the SDK thread-safe

The output of the process_file function changed to align with the process_stream function. For more information on the new output structure see https://docs.oto.ai/sdk/output-specification.

Add ability to retrieve raw model outputs to allow for customisation. For example usage see here.
Add silence detection to all models
Optimise performance when using more than one model

Initial release with the Speech, Gender and Arousal models.