Language Model Recipes

Overview

The Language model can be used to predict the language that is being spoken in a given audio snippet.

Early Access

The Language Model is currently available in Early Access. If you wish to try it out, contact Support at support@oto.ai.

The following languages are currently supported:

  • English (en)
  • French (fr)
  • German (de)
  • Italian (it)
  • Spanish (es)

Pre-requisites

  • DeepTone™ SDK and License Key
  • Audio File(s) you want to process

Sample data

You can download this sample audio file which contains speech in English, French and German.

Default Summary - Example 1

Replace the values for VALID_LICENSE_KEY and FILE_TO_PROCESS with the correct values before running the example.

The summary output presents us with the fraction of the audio which falls in a particular class. In the case below we are interested in determining what the predominant language was in the given file.

from deeptone import Deeptone
from deeptone.deeptone import LANGUAGE_NO_SPEECH
import time
# Set the required constants
VALID_LICENSE_KEY = None
FILE_TO_PROCESS = None
assert not None in (VALID_LICENSE_KEY, FILE_TO_PROCESS), "Set the required constants"
# Initialise Deeptone
engine = Deeptone(license_key=VALID_LICENSE_KEY)
output = engine.process_file(
filename=FILE_TO_PROCESS,
models=[engine.models.Language],
output_period=1024,
channel=0,
use_chunking=True,
include_summary=True,
include_transitions=True,
volume_threshold=0
)
language_summary = output["channels"]["0"]["summary"]["language"]
predominant_fraction = None
predominant_fraction_value = 0.0
for key in language_summary:
if key != f"{LANGUAGE_NO_SPEECH}_fraction" and language_summary[key] > predominant_fraction_value:
predominant_fraction = key
predominant_fraction_value = language_summary[key]
print(f"The predominant language is {predominant_fraction[:-9]}, spoken {round(predominant_fraction_value * 100, 2)}% of the time")