Real Time Processing
The DeepTone™ Cloud API can be used to process a real time audio stream.
Supported Formats
All data is sent to the streaming endpoint as binary-type Websocket messages,
whose payloads are the raw audio data. Any other payload will cause an error to be returned and the connection
to be closed. You can stream this to DeepTone in real-time, and since the protocol is full-duplex,
you will still receive DeepTone responses while uploading data. Currently, only 16-bit, little endian,
signed PCM WAV data is supported by the stream endpoint.
If a different sample rate is provided you need to specify the sample_rate
in the request parameters.
The data will be re-sampled, but generally the results might not be accurate.
Configuration and Outputs
There are different configuration parameters and types of outputs which can be requested.
For code sample go to Example Usage. For detailed output specification go to Output specification.
Available configuration parameters
There are several possible parameters which can be added to the requests to the /stream
websocket endpoint:
models
- the list of model names to use for the audio analysisoutput_period
- how often (in milliseconds, multiple of 64) the output of the models should be returnedinclude_raw_values
- optionally if the result should contain raw model outputsvolume_threshold
- optionally if a volume level different than default should be considered (higher values will result in more of the data being treated as silence)sample_rate
- in streaming mode, the user has to specify the sample rate of the audio that is being sent.
For code sample go to Example Usage. For detailed output specification go to Output specification.
Available Outputs
You will get one output response from the DeepTone™ Cloud API per output_period
milliseconds of the provided input,
representing timestamped results from the requested models.
{"timestamp" : 0, "results": {"gender": {"result": "female", "confidence": 0.6418}}}
{"timestamp" : 1024, "results": {"gender": {"result": "male", "confidence": 0.9012}}}
{"timestamp" : 2048, "results": {"gender": {"result": "male", "confidence": 0.7698}}}
{"timestamp" : 3072, "results": {"gender": {"result": "silence", "confidence": 1.0}}}
{"timestamp" : 4096, "results": {"gender": {"result": "female", "confidence": 0.9780}}}
{"timestamp" : 5120, "results": {"gender": {"result": "female", "confidence": 0.8991}}}
If include_raw_values
is set to true
each result object will also include the raw model outputs:
{
"timestamp":1024,
"results":{
"gender":{
"result": "male",
"confidence": 0.7088
}
},
"raw": {
"gender":{
"male": 0.1211,
"female": 0.8789
}
}
}
Example Usage
Streaming from a microphone
- Shell + sox, websocat
- Python
- HTML + JavaScript
To stream your microphone input to the DeepTone™ Cloud API and get real-time results you can use sox together with websocat.
The following command will request the microphone input stream to be processed using
the SpeechRT model with a volume_threshold
of 0.001
sox -q -d -t raw -b 16 -r 16000 -e signed -c 1 - | websocat "wss://api.oto.ai/stream?models=speech-rt&volume_threshold=0.001" -b -H "X-Api-Key: YOUR_API_KEY"
To stream your microphone input to the DeepTone™ Cloud API and get real-time results using python
you can use the following script. The script requires pyaudio
and websocket_client
to be installed.
Installing websocket
pip install websocket_client
Installing pyaudio
If you already have pyaudio installed in your environment or an alternative package to stream audio from a microphone, go straight to the code.
- Mac
- Windows
On mac, you may have to install or overwrite portaudio, before installing pyaudio
brew install portaudio
then inside your virtualenv
pip install pyaudio
Reference: https://medium.com/@koji_kanao/when-cant-install-pyaudio-with-pip-190973840dbf
On Windows, you can install pyaudio with python3.7 from a wheel.
Download the wheel on this site https://www.lfd.uci.edu/~gohlke/pythonlibs/#pyaudio. Choose
PyAudio‑0.2.11‑cp37‑cp37m‑win32.whl
if you use 32 bit, orPyAudio‑0.2.11‑cp37‑cp37m‑win_amd64.whl
for 64 bit.Install from the wheel
pip install <path_to_wheel>
Reference: https://stackoverflow.com/a/54999645
Example Script (microphone)
After pyaudio was installed successfully you can run the script:
import time
import threading
# dependencies
import pyaudio
import websocket
# REPLACE WITH YOUR KEY
API_KEY = 'REPLACE_ME_WITH_YOUR_API_KEY'
# Send a message to the API every second. Lower this number if you want a lower latency
CHUNK_SIZE = 1024
# Define microphone input stream
pa = pyaudio.PyAudio()
stream = pa.open(
format=pyaudio.paInt16,
channels=1,
rate=16000,
input=True,
frames_per_buffer=CHUNK_SIZE,
)
# What to do with results
def on_message(ws, message):
print(message)
def on_error(ws, error):
print(error)
def on_close(ws):
print("### closed ###")
# Once websocket connection is established start sending microphone input stream
def on_open(ws):
stream.start_stream()
def run():
while stream.is_active():
data = stream.read(CHUNK_SIZE)
ws.send(bytearray(data), websocket.ABNF.OPCODE_BINARY)
ws.close()
thread = threading.Thread(target=run)
thread.start()
if __name__ == "__main__":
ws = websocket.WebSocketApp(f"wss://api.oto.ai/stream?models=speech&output_period={CHUNK_SIZE}&volume_threshold=0.0",
header={'X-API-KEY': API_KEY},
on_message=on_message,
on_error=on_error,
on_close=on_close)
ws.on_open = on_open
ws.run_forever()
You can download the complete python example script here .
To stream your microphone input to the DeepTone™ Cloud API and get real-time results using JavaScript you can use the following example.
- Create an HTML file and paste the code below on it.
- Replace
<YOUR_API_KEY>
with your API key. - Save the file.
- Open the file using your browser (for the best compatibility we recommend using
Google Chrome
for this example).
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8" />
<meta http-equiv="X-UA-Compatible" content="IE=edge" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<script src="https://www.WebRTC-Experiment.com/RecordRTC.js"></script>
<script>
const CONSTRAINTS = { audio: true, video: false };
const API_KEY = '<YOUR_API_KEY>';
let audioStream = null;
let webSocket = null;
let mediaRecorder = null;
const startRecording = () => {
document.getElementById('results').innerHTML = '';
document.getElementById('message').innerHTML = '';
shouldStop = false;
navigator.mediaDevices
.getUserMedia(CONSTRAINTS)
.then((audioStream) => {
if (
!(
navigator.mediaDevices &&
navigator.mediaDevices.getDisplayMedia
)
) {
return window.alert('Screen Record not supported!');
}
webSocket = new WebSocket(
`wss://api.oto.ai/stream?models=speech-rt&output_period=512&api_key=${API_KEY}`
);
mediaRecorder = new RecordRTC(audioStream, {
recorderType: RecordRTC.StereoAudioRecorder,
type: 'audio',
mimeType: 'audio/wav',
desiredSampRate: 16000,
numberOfAudioChannels: 1,
timeSlice: 1000,
disableLogs: true,
ondataavailable: (e) => {
if (e.size > 0) {
if (
webSocket.readyState === webSocket.OPEN
) {
webSocket.send(e);
}
}
},
});
webSocket.onclose = function (event) {
document.getElementById('message').innerHTML =
'WebSocket closed!';
};
webSocket.onerror = function (event) {
document.getElementById('message').innerHTML =
'WebSocket error: ' + event.message;
};
webSocket.onmessage = (message) => {
const parsedMessage = JSON.parse(message.data);
if (
parsedMessage['message_type'] === 'connection'
) {
document.getElementById('message').innerHTML =
parsedMessage.message;
} else {
const prediction =
parsedMessage['results']['channels']['0'][
'results'
]['speech-rt'];
const listItem = document.createElement('li');
listItem.innerHTML = ``;
document
.getElementById('results')
.appendChild(listItem);
}
};
mediaRecorder.startRecording();
});
};
const stopRecording = () => {
mediaRecorder.stopRecording(() => {
webSocket.close(1000, 'Closing WebSocket...');
});
};
</script>
<title>DeepTone Streaming Example</title>
</head>
<body>
<button onclick="startRecording()">Start Recording</button>
<button onclick="stopRecording()">Stop Recording</button>
<div>
<div style="margin-top: 10px" id="message"></div>
<ul id="results"></ul>
</div>
</body>
</html>
Streaming from a file
- Shell + sox, websocat
- Python
To stream from a file to the DeepTone™ Cloud API and get real-time results you can use sox together with pv and websocat.
The following command will request your file to be processed using
the SpeechRT model with a volume_threshold
of 0.001
sox <YOUR_INPUT_FILE> -q -t raw -b 16 -r 16000 -e signed -c 1 - | pv -L 256k | websocat "wss://api.oto.ai/stream?models=speech-rt&volume_threshold=0.001" -b -H "X-Api-Key: YOUR_API_KEY" -n
To stream from a file to the DeepTone™ Cloud API and get real-time results using python
you can use the following script. The script requires websocket_client
to be installed.
Installing websocket
pip install websocket_client
Example Script (file)
After websocket_client
was installed successfully you can run the script:
import time
import threading
from scipy.io import wavfile
import websocket
# REPLACE WITH YOUR KEY
API_KEY = 'REPLACE_ME_WITH_YOUR_API_KEY'
LOCAL_AUDIO_FILE = 'PATH/TO/LOCAL/FILE.wav'
# Send a message to the API every second. Lower this number if you want a lower latency
CHUNK_SIZE = 1024
rate, data = wavfile.read(LOCAL_AUDIO_FILE)
# What to do with results
def on_message(ws, message):
print(message)
def on_error(ws, error):
print(error)
def on_close(ws):
print("### closed ###")
# Once websocket connection is established start sending microphone input stream
def on_open(ws):
def run():
index = 0
while index < len(data):
ws.send(data[index: min(len(data)-1, index + CHUNK_SIZE)].tobytes(), websocket.ABNF.OPCODE_BINARY)
index += CHUNK_SIZE
thread = threading.Thread(target=run)
thread.start()
if __name__ == "__main__":
ws = websocket.WebSocketApp(f"wss://api.oto.ai/stream?models=speech&output_period={CHUNK_SIZE}&volume_threshold=0.0",
header={'X-API-KEY': API_KEY},
on_message=on_message,
on_error=on_error,
on_close=on_close)
ws.on_open = on_open
ws.run_forever()
You can download the complete python example script here .