Language Model Recipes
Overview
The Language model can be used to predict the language that is being spoken in a given audio snippet.
Early Access
The Language Model is currently available in Early Access. If you wish to try it out, contact Support at support@oto.ai.
The following languages are currently supported:
- English (
en
) - French (
fr
) - German (
de
) - Italian (
it
) - Spanish (
es
)
Pre-requisites
- DeepTone™ SDK and License Key
- Audio File(s) you want to process
Sample data
You can download this sample audio file which contains speech in English, French and German.
Default Summary - Example 1
Replace the values for VALID_LICENSE_KEY
and FILE_TO_PROCESS
with the correct values before running the example.
The summary
output presents us with the fraction of the audio which falls in a particular class. In the case below we
are interested in determining what the predominant language was in the given file.
from deeptone import Deeptone
from deeptone.deeptone import LANGUAGE_NO_SPEECH
import time
# Set the required constants
VALID_LICENSE_KEY = None
FILE_TO_PROCESS = None
assert not None in (VALID_LICENSE_KEY, FILE_TO_PROCESS), "Set the required constants"
# Initialise Deeptone
engine = Deeptone(license_key=VALID_LICENSE_KEY)
output = engine.process_file(
filename=FILE_TO_PROCESS,
models=[engine.models.Language],
output_period=1024,
channel=0,
use_chunking=True,
include_summary=True,
include_transitions=True,
volume_threshold=0
)
language_summary = output["channels"]["0"]["summary"]["language"]
predominant_fraction = None
predominant_fraction_value = 0.0
for key in language_summary:
if key != f"{LANGUAGE_NO_SPEECH}_fraction" and language_summary[key] > predominant_fraction_value:
predominant_fraction = key
predominant_fraction_value = language_summary[key]
print(f"The predominant language is {predominant_fraction[:-9]}, spoken {round(predominant_fraction_value * 100, 2)}% of the time")