Real Time Processing
The DeepTone™ Cloud API can be used to process a real time audio stream.
Supported Formats
All data is sent to the streaming endpoint as binary-type Websocket messages,
whose payloads are the raw audio data. Any other payload will cause an error to be returned and the connection
to be closed. You can stream this to DeepTone in real-time, and since the protocol is full-duplex,
you will still receive DeepTone responses while uploading data. Currently, only 16-bit, little endian,
signed PCM WAV data is supported by the stream endpoint.
If a different sample rate is provided you need to specify the sample_rate
in the request parameters.
The data will be re-sampled, but generally the results might not be accurate.
Configuration and Outputs
There are different configuration parameters and types of outputs which can be requested.
For code sample go to Example Usage. For detailed output specification go to Output specification.
Available configuration parameters
There are several possible parameters which can be added to the requests to the /stream
websocket endpoint:
models
- the list of model names to use for the audio analysisoutput_period
- how often (in milliseconds, multiple of 64) the output of the models should be returnedinclude_raw_values
- optionally if the result should contain raw model outputsvolume_threshold
- optionally if a volume level different than default should be considered (higher values will result in more of the data being treated as silence)sample_rate
- in streaming mode, the user has to specify the sample rate of the audio that is being sent.
For code sample go to Example Usage. For detailed output specification go to Output specification.
Available Outputs
You will get one output response from the DeepTone™ Cloud API per output_period
milliseconds of the provided input,
representing timestamped results from the requested models.
If include_raw_values
is set to true
each result object will also include the raw model outputs:
Example Usage
Streaming from a microphone
- Shell + sox, websocat
- Python
To stream your microphone input to the DeepTone™ Cloud API and get real-time results you can use sox together with websocat.
The following command will request the microphone input stream to be processed using
the SpeechRT model with a volume_threshold
of 0.001
Streaming from a file
- Shell + sox, websocat
- Python
To stream from a file to the DeepTone™ Cloud API and get real-time results you can use sox together with pv and websocat.
The following command will request your file to be processed using
the SpeechRT model with a volume_threshold
of 0.001