Real Time Processing

The DeepTone™ Cloud API can be used to process a real time audio stream.

Supported Formats

All data is sent to the streaming endpoint as binary-type Websocket messages, whose payloads are the raw audio data. Any other payload will cause an error to be returned and the connection to be closed. You can stream this to DeepTone in real-time, and since the protocol is full-duplex, you will still receive DeepTone responses while uploading data. Currently, only 16-bit, little endian, signed PCM WAV data is supported by the stream endpoint. If a different sample rate is provided you need to specify the sample_rate in the request parameters. The data will be re-sampled, but generally the results might not be accurate.

Configuration and Outputs

There are different configuration parameters and types of outputs which can be requested.

For code sample go to Example Usage. For detailed output specification go to Output specification.

Available configuration parameters

There are several possible parameters which can be added to the requests to the /stream websocket endpoint:

models - the list of model names to use for the audio analysis
output_period - how often (in milliseconds, multiple of 64) the output of the models should be returned
include_raw_values - optionally if the result should contain raw model outputs
volume_threshold - optionally if a volume level different than default should be considered (higher values will result in more of the data being treated as silence)
sample_rate - in streaming mode, the user has to specify the sample rate of the audio that is being sent.

For code sample go to Example Usage. For detailed output specification go to Output specification.

Available Outputs

You will get one output response from the DeepTone™ Cloud API per output_period milliseconds of the provided input, representing timestamped results from the requested models.

{"timestamp" : 0, "results": {"gender": {"result": "female", "confidence": 0.6418}}}
{"timestamp" : 1024, "results": {"gender": {"result": "male", "confidence": 0.9012}}}
{"timestamp" : 2048, "results": {"gender": {"result": "male", "confidence": 0.7698}}}
{"timestamp" : 3072, "results": {"gender": {"result": "silence", "confidence": 1.0}}}
{"timestamp" : 4096, "results": {"gender": {"result": "female", "confidence": 0.9780}}}
{"timestamp" : 5120, "results": {"gender": {"result": "female", "confidence": 0.8991}}}

If include_raw_values is set to true each result object will also include the raw model outputs:

{
  "timestamp":1024,
  "results":{
    "gender":{
      "result": "male",
      "confidence": 0.7088
    }
  },
  "raw": {
    "gender":{
      "male": 0.1211,
      "female": 0.8789
    }
  }
}

Example Usage

Streaming from a microphone

Shell + sox, websocat
Python
HTML + JavaScript

To stream your microphone input to the DeepTone™ Cloud API and get real-time results you can use sox together with websocat.

The following command will request the microphone input stream to be processed using the SpeechRT model with a volume_threshold of 0.001

sox -q -d -t raw -b 16 -r 16000 -e signed -c 1 - | websocat "wss://api.oto.ai/stream?models=speech-rt&volume_threshold=0.001" -b -H "X-Api-Key: YOUR_API_KEY"

To stream your microphone input to the DeepTone™ Cloud API and get real-time results using python you can use the following script. The script requires pyaudio and websocket_client to be installed.

Installing websocket

pip install websocket_client

Installing pyaudio

If you already have pyaudio installed in your environment or an alternative package to stream audio from a microphone, go straight to the code.

Mac
Windows

On mac, you may have to install or overwrite portaudio, before installing pyaudio

brew install portaudio

then inside your virtualenv

pip install pyaudio

Reference: https://medium.com/@koji_kanao/when-cant-install-pyaudio-with-pip-190973840dbf

On Windows, you can install pyaudio with python3.7 from a wheel.

Download the wheel on this site https://www.lfd.uci.edu/~gohlke/pythonlibs/#pyaudio. Choose PyAudio‑0.2.11‑cp37‑cp37m‑win32.whl if you use 32 bit, or PyAudio‑0.2.11‑cp37‑cp37m‑win_amd64.whl for 64 bit.
Install from the wheel

pip install <path_to_wheel>

Reference: https://stackoverflow.com/a/54999645

Example Script (microphone)

After pyaudio was installed successfully you can run the script:

import time
import threading

# dependencies
import pyaudio
import websocket

# REPLACE WITH YOUR KEY
API_KEY = 'REPLACE_ME_WITH_YOUR_API_KEY'

# Send a message to the API every second. Lower this number if you want a lower latency
CHUNK_SIZE = 1024

# Define microphone input stream
pa = pyaudio.PyAudio()
stream = pa.open(
    format=pyaudio.paInt16,
    channels=1,
    rate=16000,
    input=True,
    frames_per_buffer=CHUNK_SIZE,
)


# What to do with results
def on_message(ws, message):
    print(message)


def on_error(ws, error):
    print(error)


def on_close(ws):
    print("### closed ###")


# Once websocket connection is established start sending microphone input stream
def on_open(ws):
    stream.start_stream()

    def run():
        while stream.is_active():
            data = stream.read(CHUNK_SIZE)
            ws.send(bytearray(data), websocket.ABNF.OPCODE_BINARY)
        ws.close()

    thread = threading.Thread(target=run)
    thread.start()


if __name__ == "__main__":
    ws = websocket.WebSocketApp(f"wss://api.oto.ai/stream?models=speech&output_period={CHUNK_SIZE}&volume_threshold=0.0",
                                header={'X-API-KEY': API_KEY},
                                on_message=on_message,
                                on_error=on_error,
                                on_close=on_close)
    ws.on_open = on_open
    ws.run_forever()

You can download the complete python example script here .

To stream your microphone input to the DeepTone™ Cloud API and get real-time results using JavaScript you can use the following example.

Create an HTML file and paste the code below on it.
Replace <YOUR_API_KEY> with your API key.
Save the file.
Open the file using your browser (for the best compatibility we recommend using Google Chrome for this example).

<!DOCTYPE html>
<html lang="en">
    <head>
        <meta charset="UTF-8" />
        <meta http-equiv="X-UA-Compatible" content="IE=edge" />
        <meta name="viewport" content="width=device-width, initial-scale=1.0" />
        <script src="https://www.WebRTC-Experiment.com/RecordRTC.js"></script>
        <script>
            const CONSTRAINTS = { audio: true, video: false };
            const API_KEY = '<YOUR_API_KEY>';

            let audioStream = null;
            let webSocket = null;
            let mediaRecorder = null;

            const startRecording = () => {
                document.getElementById('results').innerHTML = '';
                document.getElementById('message').innerHTML = '';
                shouldStop = false;
                navigator.mediaDevices
                    .getUserMedia(CONSTRAINTS)
                    .then((audioStream) => {
                        if (
                            !(
                                navigator.mediaDevices &&
                                navigator.mediaDevices.getDisplayMedia
                            )
                        ) {
                            return window.alert('Screen Record not supported!');
                        }

                        webSocket = new WebSocket(
                            `wss://api.oto.ai/stream?models=speech-rt&output_period=512&api_key=${API_KEY}`
                        );

                        mediaRecorder = new RecordRTC(audioStream, {
                            recorderType: RecordRTC.StereoAudioRecorder,
                            type: 'audio',
                            mimeType: 'audio/wav',
                            desiredSampRate: 16000,
                            numberOfAudioChannels: 1,
                            timeSlice: 1000,
                            disableLogs: true,
                            ondataavailable: (e) => {
                                if (e.size > 0) {
                                    if (
                                        webSocket.readyState === webSocket.OPEN
                                    ) {
                                        webSocket.send(e);
                                    }
                                }
                            },
                        });

                        webSocket.onclose = function (event) {
                            document.getElementById('message').innerHTML =
                                'WebSocket closed!';
                        };

                        webSocket.onerror = function (event) {
                            document.getElementById('message').innerHTML =
                                'WebSocket error: ' + event.message;
                        };

                        webSocket.onmessage = (message) => {
                            const parsedMessage = JSON.parse(message.data);
                            if (
                                parsedMessage['message_type'] === 'connection'
                            ) {
                                document.getElementById('message').innerHTML =
                                    parsedMessage.message;
                            } else {
                                const prediction =
                                    parsedMessage['results']['channels']['0'][
                                        'results'
                                    ]['speech-rt'];
                                const listItem = document.createElement('li');
                                listItem.innerHTML = `Prediction: ${prediction['result']} // Confidence: ${prediction['confidence']}`;
                                document
                                    .getElementById('results')
                                    .appendChild(listItem);
                            }
                        };

                        mediaRecorder.startRecording();
                    });
            };

            const stopRecording = () => {
                mediaRecorder.stopRecording(() => {
                    webSocket.close(1000, 'Closing WebSocket...');
                });
            };
        </script>
        <title>DeepTone Streaming Example</title>
    </head>
    <body>
        <button onclick="startRecording()">Start Recording</button>
        <button onclick="stopRecording()">Stop Recording</button>
        <div>
            <div style="margin-top: 10px" id="message"></div>
            <ul id="results"></ul>
        </div>
    </body>
</html>

Streaming from a file

Shell + sox, websocat
Python

To stream from a file to the DeepTone™ Cloud API and get real-time results you can use sox together with pv and websocat.

The following command will request your file to be processed using the SpeechRT model with a volume_threshold of 0.001

sox <YOUR_INPUT_FILE> -q -t raw -b 16 -r 16000 -e signed -c 1 - | pv -L 256k | websocat "wss://api.oto.ai/stream?models=speech-rt&volume_threshold=0.001" -b -H "X-Api-Key: YOUR_API_KEY" -n

To stream from a file to the DeepTone™ Cloud API and get real-time results using python you can use the following script. The script requires websocket_client to be installed.

Installing websocket

pip install websocket_client

Example Script (file)

After websocket_client was installed successfully you can run the script:

import time
import threading
from scipy.io import wavfile
import websocket
# REPLACE WITH YOUR KEY
API_KEY = 'REPLACE_ME_WITH_YOUR_API_KEY'
LOCAL_AUDIO_FILE = 'PATH/TO/LOCAL/FILE.wav'
# Send a message to the API every second. Lower this number if you want a lower latency
CHUNK_SIZE = 1024
rate, data = wavfile.read(LOCAL_AUDIO_FILE)
# What to do with results
def on_message(ws, message):
    print(message)
def on_error(ws, error):
    print(error)
def on_close(ws):
    print("### closed ###")
# Once websocket connection is established start sending microphone input stream
def on_open(ws):
    def run():
        index = 0
        while index < len(data):
            ws.send(data[index: min(len(data)-1, index + CHUNK_SIZE)].tobytes(), websocket.ABNF.OPCODE_BINARY)
            index += CHUNK_SIZE
    thread = threading.Thread(target=run)
    thread.start()
if __name__ == "__main__":
    ws = websocket.WebSocketApp(f"wss://api.oto.ai/stream?models=speech&output_period={CHUNK_SIZE}&volume_threshold=0.0",
                                header={'X-API-KEY': API_KEY},
                                on_message=on_message,
                                on_error=on_error,
                                on_close=on_close)
    ws.on_open = on_open
    ws.run_forever()

You can download the complete python example script here .

Supported Formats​

Configuration and Outputs​

Example Usage​

Streaming from a microphone​

Installing websocket​

Installing pyaudio​

Example Script (microphone)​

Streaming from a file​

Installing websocket​

Example Script (file)​

Supported Formats

Configuration and Outputs

Example Usage

Streaming from a microphone

Installing websocket

Installing pyaudio

Example Script (microphone)

Streaming from a file

Installing websocket

Example Script (file)