LUFS Model
The LUFS model can classify audio into "very_faint", "faint", "moderate", "loud" or "painful" based on their perceived loudness relative to full scale:
very_faint
- Sound that is barely above the threshold of hearing for humans (-50LUFS
and below)faint
- Audible sound that is very low, e.g. a whisper (-50LUFS
to-30LUFS
)moderate
- Clear audible sounds, e.g. human conversation (-30LUFS
to-10LUFS
)loud
- Intensity level corresponding to very loud sounds, e.g. power tools, alarm clocks, loud music (-10LUFS
to0LUFS
)painful
- Intensity levels that are uncomfortable and painful/dangerous, e.g. jet planes, fireworks, jackhammers (0LUFS
)
The confidence values that the model produces are always 1
since they are based on a deterministic calculation.
When trying to detect speech in a normal conversation,
we recommend looking for intensity_level
s in the moderate range.
Specification
Receptive Field | Result Type |
---|---|
512ms | result ∈ ["very_faint", "faint", "moderate", "loud", "painful"] |
Time-series
The time-series result will be an iterable with elements that contain the following information:
{
"timestamp": 0,
"results": {
"lufs": {
"result": "moderate",
"confidence": 1.0
}
}
}
Time-series with raw values
If the raw values were requested, they will be added to the time-series results. Note that for this particular model
the only thing that will be shown is the intensity_level
which corresponds to the calculated intensity level in LUFS
.
{
"timestamp": 0,
"results": {
"lufs": {
"result": "moderate",
"confidence": 1.0
}
},
"raw": {
"lufs": {
"intensity_level": -25.234
}
}
}
Summary
In case a summary is requested the following will be returned
{
"lufs": {
"very_faint_fraction": 0.0,
"faint": 0.2903,
"moderate_fraction": 0.7097,
"loud_fraction": 0.0,
"painful_fraction": 0.0
}
}
where x_fraction represents the percentage of time that x class was identified for the duration of the input.
Transitions
In case the transitions are requested a time-series with transition elements like shown below will be returned.
{
"timestamp_start": 0,
"timestamp_end": 256,
"result": "moderate",
"confidence": 1.0
},
{
"timestamp_start": 256,
"timestamp_end": 448,
"result": "faint",
"confidence": 1.0
},
The example above means that the first 256ms of the audio snippet represented an intensity level corresponding to a normal human conversation, and between 256ms and 448ms it was faint.