A multimodal model for predicting feedback position and type during conversation - Réseau de recherche en Théorie des Systèmes Distribués, Modélisation, Analyse et Contrôle des Systèmes Accéder directement au contenu
Article Dans Une Revue Speech Communication Année : 2024

A multimodal model for predicting feedback position and type during conversation

Résumé

This study investigates conversational feedback, that is, a listener’s reaction in response to a speaker, a phe nomenon which occurs in all natural interactions. Feedback depends on the main speaker’s productions and in return supports the elaboration of the interaction. As a consequence, feedback production has a direct impact on the quality of the interaction. This paper examines all types of feedback, from generic to specific feedback, the latter of which has received less attention in the literature. We also present a fine-grained labeling system introducing two sub-types of specific feedback: positive/negative and given/new. Following a literature review on linguistic and machine learning perspectives highlighting the main issues in feedback prediction, we present a model based on a set of multimodal features which predicts the possible position of feedback and its type. This computational model makes it possible to precisely identify the different features in the speaker’s production (morpho-syntactic, prosodic and mimo-gestural) which play a role in triggering feedback from the listener; the model also evaluates their relative importance. The main contribution of this study is twofold: we sought to improve 1/ the model’s performance in com parison with other approaches relying on a small set of features, and 2/ the model’s interpretability, in particular by investigating feature importance. By integrating all the different modalities as well as high-level features, our model is uniquely positioned to be applied to French corpora.
Fichier principal
Vignette du fichier
1-s2.0-S0167639324000384-main.pdf (2.69 Mo) Télécharger le fichier
Origine : Publication financée par une institution
licence : CC BY NC - Paternité - Pas d'utilisation commerciale

Dates et versions

hal-04551398 , version 1 (18-04-2024)

Licence

Paternité - Pas d'utilisation commerciale

Identifiants

Citer

Auriane Boudin, Roxane Bertrand, Stéphane Rauzy, Magalie Ochs, Philippe Blache. A multimodal model for predicting feedback position and type during conversation. Speech Communication, 2024, 159, pp.103066. ⟨10.1016/j.specom.2024.103066⟩. ⟨hal-04551398⟩
0 Consultations
0 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More