Lexical Information

1 sources - 4 claims

Lexical features extracted from transcripts complemented acoustic features. BERT Base was selected because larger or more heavily pretrained text backbones did not meaningfully improve downstream performance. Privacy constraints motivated transferring text-derived information into deployable audio-only models. The overall approach uses both audio and text during training while releasing an audio-only model.