Skip to main content

Audio Chunk Processing

DeepTone™'s Audio Chunk Processing functionality allows you to extract insights directly from audio chunks provided as byte numpy arrays. It provides analysis for a given chunk as a whole (as opposed to the type of analysis where a time-series is being returned) and generates output structured similarly to the Stream Processing.

Example Usage

You can use the process_audio_chunk method to process audio bytes directly and receive results aggregated for the entire chunk that was provided. In the example below, we are reading the bytes from an audio file but you can provide them from any other source.

from scipy.io import wavfile

from deeptone import Deeptone

# Read in some audio bytes
_, audio_bytes = wavfile.read("PATH_TO_AUDIO_FILE")

# Initialise Deeptone
engine = Deeptone(license_key="...")

output = engine.process_audio_chunk(
data=audio_bytes,
models=[engine.models.Speech, engine.models.Gender],
include_raw_values=True,
volume_threshold=0.005
)

The returned object contains the results aggregated for the entire chunk:

# Inspect the result
print(output)

results_processed = output["results"]
results_raw = output["raw"]

print("\nChunk results:")
for model_name in results_processed.keys():
result = results_processed[model_name]["result"]
confidence = results_processed[model_name]["confidence"]
print(f"Model: {model_name}, result: '{result}' with {confidence} confidence")

print("\nRaw chunk results:")
for model_name in results_raw.keys():
results = results_raw[model_name]
report_str = f"Model: {model_name}, raw results: "
for label in results.keys():
report_str += f"'{label}' with {results[label]} confidence, "
print(report_str.rstrip(", "))

The output of the script would be something like:

Chunk results:
Model: gender, result: 'female' with 0.6255 confidence
Model: speech, result: 'speech' with 0.8141 confidence

Raw chunk results:
Model: gender, raw results: 'male' with 0.1872 confidence, 'female' with 0.8128 confidence
Model: speech, raw results: 'music' with 0.0172 confidence, 'other' with 0.1686 confidence, 'speech' with 0.8141 confidence

Raw output:

{
"results": {
"gender": {
"result": "female",
"confidence": 0.6255
},
"speech": {
"result": "speech",
"confidence": 0.8141
}
},
"raw": {
"gender": {
"male": 0.1872,
"female": 0.8128
},
"speech": {
"music": 0.0172,
"other": 0.1686,
"speech": 0.8141
}
}
}