The Deeptone SDK can be used to process a real time audio stream.
The data being fed via the
input_generator should be 16-bit PCM with the sample rate of 16 kHz.
Configuration options and outputs
There are different configuration options and types of outputs which can be used depending on the SDK language.
Available configuration options
There are several possible arguments which can be passed to the
input_generator- generator that yields byte arrays representing audio data properly sampled
models- the list of model names to use for the audio analysis
output_period- how often (in milliseconds, multiple of 64) the output of the models should be returned
include_raw_values- optionally if the result should contain raw model outputs
volume_threshold- optionally if a volume level different than default should be considered (higher values will result in more of the data being treated as silence)
A generator will be returned which will yield one output per
output_period milliseconds of the provided input, representing timestamped
results from the requested models.
You can use the
process_stream method to process a stream of audio. You will need to provide
a valid generator that yields audio bytes. Below you will find two different examples, where we:
- open an audio file and stream bytes from that file, or
- stream bytes using microphone as an input source
1. Streaming bytes from an audio file
2. Streaming bytes from a microphone
You can find even more detailed recipes on using a microphone in the Gender model recipes section.
In either of those two cases, the returned object is a generator that will yield results for every
The output of the script would be something like:
You can find more detailed recipes for real-time processing of microphone input in the Gender model recipes section. For example usage of `raw` output to implement custom speech thresholds, head to Example 3 in Speech detection recipes.