Changelog¶
0.14.1¶
Improvement¶
Clip LUFS and SoundPower model output to avoid
-inf
values
0.14.0¶
Features¶
Added a
LUFS
model to compute the instantaneous loudness following the standard ITU-R BS.1770.
Improvement¶
Added
Sonarqube
integration to inspect the code quality.
0.13.0¶
Breaking Changes¶
Add support for Python 3.10
Drop support for Python 3.7
Improvements¶
Update TensorFlow to 2.8.0
Support arm-64 (M1)
Update
numpy
to1.22.3
andscipy
to1.8.0
0.12.2¶
Improvements¶
Support older linux distros (ubuntu18, debian10, …)
0.12.1¶
Vulnerability Fixes¶
Updated numpy to 1.21.4 to fix vulnerability CVE-2021-33430. This also required an update of scipy to 1.7.3
0.12.0¶
Breaking Changes¶
Changes the expected input specification of the SpeakerMap model. This change is not backwards compatible, so users that are updating from an older version will need to use the new license-key that was provided via email. Alternatively, it is possible to request that the current license-key is updated to the new model bundle format such that no code changes are required.
Features¶
Updates the
Gender
andUnderageSpeaker
model for improved performance in environments with reverberation.
Bug Fixes¶
Updates the dependency version constraints for the NumPy and SciPy dependencies. Previously the version constraints allowed the installation of incompatible NumPy and SciPy versions.
0.11.1¶
Features¶
Updates the
SoundPower
class ranges to more adequately reflect real audio recordings (class names remain unaltered).
Bug Fixes¶
Adds missing
reliability_threshold
toSoundPower
andAudioEvent
models, which was causing the analysis to crash when processing files with at least one of these models requested in combination withuse_chunking
set toTrue
.
0.11.0¶
Features¶
Updates the
Arousal
model for improved performance on environments with background disruptive noise.Updates the
Gender
model receptive field to3130ms
.Adds new
AudioEvent
model that is able to identify various categories of human produced sounds.Adds new
SoundPower
model that is able to measure the intensity level of the audio indB
.Adds new
UnderageSpeaker
model that is able to classify speakers asadult
/children
.Updates
tensorflow
to2.4.0
.
0.10.0¶
Features¶
Updates to the SpeakerMap model for improved performance
Adds VoiceSignatures functionality which can be combined with the SpeakerMap model to facilitate:
identification of speakers across multiple files
pre-training to improve identification of known speakers
Expands supported datatypes
Until now only audio in signed 16-bit integer and 32-bit float was supported, now we support all float (little-endian) types and signed little endian integer types (except the s24le) and unsigned 8 bit data. ** Supported list as per the pcm spec ->
pcm_f32le
,pcm_f64le
,pcm_s16le
,pcm_s32le
andpcm_u8
description here on the ffmpeg-docs
0.9.0¶
Breaking Changes¶
The model bundle format was improved to make it easier to integrate new models. This change is not backwards compatible, so users that are updating from an older version will need to use the new license-key that was provided via email. Alternatively, it is possible to request that the current license-key is updated to the new model bundle format such that no code changes are required.
The DeepTone SDK can no longer be used together with the tensorflow module. For more information visit https://docs.oto.ai/sdk/troubleshooting#deeptone-sdk-and-tensorflow
Bug Fixes¶
Fixed bug in the audio resampling function. This bug only affected users that processed audio with a sampling rate other than 16000 Hz.
Features¶
Remove tensorflow python module dependency which makes the DeepTone SDK package slimmer
Add language model support Early access on request
Add speaker-map model support for file processing Alpha version - access on request
Add python 3.7+ support
Overall performance improvements
0.8.0¶
Breaking Changes¶
The DeepTone python sdk is now integrated with our licensing API. The model bundle will be downloaded automatically, so the user does not need to provide the model bundle location anymore. The sdk can be initialized by providing only the license key:
engine = Deeptone(license_key="YOUR_KEY")
NOTE: A new license key will be required. License keys that worked for previous SDK versions will not work anymore.
The update process will be easier as well. The license keys are linked to a model bundle version. Once we have new improved model versions ready a new license key linked to the new model bundle version is distributed to the users. There are two options for users to update:
Exchange the current license key with the new one whenever you are ready to update to the new model bundle version. The SDK will then download and use the new model bundle.
A user can alternatively request that their current license key is upgraded to the new model bundle version. The SDK will then download and use the new model bundle the next time it is initialized.
The SpeechRT model now has three classes:
speech
,music
,other
Fixed Bugs¶
Fixed conversion of input data of type int16 to float32, such that the results are consistent for models with built-in normalization and models without normalization.
Features¶
The SpeechRT model is now the default VAD model
Performance improvements when using more than one model
Use optimized model bundle format to increase load and inference performance
Comes with improved versions of the Speech, SpeechRT, Arousal and Gender models
New default value for the
volume threshold
(0.005
). This default value is more robust in common scenarios.
0.7.0¶
Bug Fixes¶
Remove unnecessary tensorflow log lines
Features¶
Add Emotions model for emotions classification
Update to tensorflow 2.3
0.6.0¶
Bug Fixes¶
Fixed bug in stream processing to account correctly for receptive field of the model
When
use_chunking=True
, the chunking method is actually used
Features¶
Add SpeechRT model for low-latency speech predictions (decision latency <100ms)
Add new methods of processing -
process_audio_bytes
,process_audio_chunk
- more suitable for analysing byte numpy arrays directlyMake the SDK thread-safe
0.5.0¶
Breaking Changes¶
The output of the
process_file
function changed to align with theprocess_stream
function. For more information on the new output structure see https://docs.oto.ai/sdk/output-specification.
Bug Fixes¶
Performance bug in the output calculation
File processing results are now consistent with the streaming results
Typo in ‘GENDER_UNKNOWN’ constant
Features¶
Add ability to retrieve raw model outputs to allow for customisation. For example usage see here.
Add silence detection to all models
Optimize performance when using more than one model
0.4.0¶
Initial release with the Speech, Gender and Arousal models.