A multimodal model for predicting feedback position and type during conversation

This study investigates conversational feedback, that is, a listener’s reaction in response to a speaker, a phe nomenon which occurs in all natural interactions. Feedback depends on the main speaker’s productions and in return supports the elaboration of the interaction. As a consequence, feedback production has a direct impact on the quality of the interaction. This paper examines all types of feedback, from generic to specific feedback, the latter of which has received less attention in the literature. We also present a fine-grained labeling system introducing two sub-types of specific feedback: positive/negative and given/new. Following a literature review on linguistic and machine learning perspectives highlighting the main issues in feedback prediction, we present a model based on a set of multimodal features which predicts the possible position of feedback and its type. This computational model makes it possible to precisely identify the different features in the speaker’s production (morpho-syntactic, prosodic and mimo-gestural) which play a role in triggering feedback from the listener; the model also evaluates their relative importance. The main contribution of this study is twofold: we sought to improve 1/ the model’s performance in com parison with other approaches relying on a small set of features, and 2/ the model’s interpretability, in particular by investigating feature importance. By integrating all the different modalities as well as high-level features, our model is uniquely positioned to be applied to French corpora.

Mots clés

Feedback Multimodality Linguistic interaction Statistical model Corpus study

Domaines

Linguistique Modélisation et simulation

Fichier principal

1-s2.0-S0167639324000384-main.pdf (2.69 Mo)

Origine : Publication financée par une institution
licence : CC BY NC - Paternité - Pas d'utilisation commerciale

Auriane Boudin : Connectez-vous pour contacter le contributeur

https://hal.science/hal-04551398

Soumis le : jeudi 18 avril 2024-15:04:31

Dernière modification le : jeudi 25 avril 2024-10:17:32

Dates et versions

hal-04551398 , version 1 (18-04-2024)

Licence

Paternité - Pas d'utilisation commerciale

Identifiants

HAL Id : hal-04551398 , version 1
DOI : 10.1016/j.specom.2024.103066

Citer

Auriane Boudin, Roxane Bertrand, Stéphane Rauzy, Magalie Ochs, Philippe Blache. A multimodal model for predicting feedback position and type during conversation. Speech Communication, 2024, 159, pp.103066. ⟨10.1016/j.specom.2024.103066⟩. ⟨hal-04551398⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UNIV-TLN CNRS UNIV-AMU LPL-AIX TDS-MACS ILCB LIS-LAB ANR INCIAM

0 Consultations

0 Téléchargements