Skip to main content

LUFS Model

The LUFS model can classify audio into "very_faint", "faint", "moderate", "loud" or "painful" based on their perceived loudness relative to full scale:

  • very_faint - Sound that is barely above the threshold of hearing for humans (-50LUFS and below)
  • faint - Audible sound that is very low, e.g. a whisper (-50LUFS to -30LUFS)
  • moderate - Clear audible sounds, e.g. human conversation (-30LUFS to -10LUFS)
  • loud - Intensity level corresponding to very loud sounds, e.g. power tools, alarm clocks, loud music (-10LUFS to 0LUFS)
  • painful - Intensity levels that are uncomfortable and painful/dangerous, e.g. jet planes, fireworks, jackhammers (0LUFS)

The confidence values that the model produces are always 1 since they are based on a deterministic calculation.

When trying to detect speech in a normal conversation, we recommend looking for intensity_levels in the moderate range.

Specification

Receptive FieldResult Type
512msresult ∈ ["very_faint", "faint", "moderate", "loud", "painful"]

Time-series

The time-series result will be an iterable with elements that contain the following information:

{
"timestamp": 0,
"results": {
"lufs": {
"result": "moderate",
"confidence": 1.0
}
}
}

Time-series with raw values

If the raw values were requested, they will be added to the time-series results. Note that for this particular model the only thing that will be shown is the intensity_level which corresponds to the calculated intensity level in LUFS.

{
"timestamp": 0,
"results": {
"lufs": {
"result": "moderate",
"confidence": 1.0
}
},
"raw": {
"lufs": {
"intensity_level": -25.234
}
}
}

Summary

In case a summary is requested the following will be returned

{
"lufs": {
"very_faint_fraction": 0.0,
"faint": 0.2903,
"moderate_fraction": 0.7097,
"loud_fraction": 0.0,
"painful_fraction": 0.0
}
}

where x_fraction represents the percentage of time that x class was identified for the duration of the input.

Transitions

In case the transitions are requested a time-series with transition elements like shown below will be returned.

 {
"timestamp_start": 0,
"timestamp_end": 256,
"result": "moderate",
"confidence": 1.0
},
{
"timestamp_start": 256,
"timestamp_end": 448,
"result": "faint",
"confidence": 1.0
},

The example above means that the first 256ms of the audio snippet represented an intensity level corresponding to a normal human conversation, and between 256ms and 448ms it was faint.