Anonymisation for Mobile SDKs
GDPR interpretation of DeepTone embeddings
Identity embeddings by themselves do not allow the identification of a natural person. However, when combined with the Identity model and additional voice data of some individuals, they may allow matching of the embeddings with those individuals. Hence the identity embeddings constitute pseudonymised data.
Identity anonymisation protocol
Identity embeddings can be anonymised by removing the mean value of the embeddings computed over the time dimension. Each speaker tends to cluster around a precise location in the DeepTone™ identity space, therefore removing the average value is a form of anonymisation. Concretely, a file processed with DeepTone will return an array with a time dimension and 128 features dimensions.
- Python
- Swift/iOS
- Android
y = deeptone.process(file)
print(y.shape) # (n_predictions, 128)
y_anon = y - np.mean(y, axis=0)
let data: DeeptoneOutput = try! deeptone.loadAudioFile(filePath: audioFile!)
var sum = [Float](repeating: 0.0, count: 128)
data.identity.forEach { array in
for (index, element) in array.enumerated() {
sum[index] = sum[index] + element
}
}
let mean = sum.map { $0 / Float(data.identity.count) }
let anonymizedEmbedding = data.identity.map { array in
array.enumerated().map { (index, element) in
return element - mean[index]
}
}
val data = deeptoneSDK.loadAudioFile(audioFileDescriptor)
val sum = FloatArray(128)
data!!.identity.forEach { array ->
array.forEachIndexed { idx, element ->
sum[idx] += element
}
}
val mean = sum.map { it / data.identity.count()}
val anonymizedEmbedding = data.identity.map { array ->
array.mapIndexed { idx, element ->
element - mean[idx]
}
}
Note that this process destroys some of the information captured by the embeddings and is irreversible. Depending on your use case, you may wish to run experiments and analyse the impact of the information loss on your models, and use a different way to process the embeddings to make them anonymised.
anonymizedIdentity (iOS) / identityv2 (Android) model
The lifted identity model was designed using a different modelling approach which generates embeddings in an infinite latent space, contrary to the vanilla identity model where the embeddings are projected to the surface of a hypersphere. We recommend using the same anonymisation procedure for this model.
Multi-speaker scenario
The above approach functions only if a single speaker is present in the audio file. In the case of multiple speakers, we recommend to first perform some speaker separation before applying this procedure for each speaker. If that approach is not possible, we recommend computing a moving average of the identity embeddings with a window of around 3 seconds, and subtracting it from the original identity embeddings. Note that this approach is more aggressive and will tend to result in a stronger level of anonymisation, at the cost of less information available for later processing.
Further assistance
We would be happy to discuss specific use cases in more detail on support@oto.ai.