Graph integration of structured, semistructured and unstructured data for data journalism - INRIA - Institut National de Recherche en Informatique et en Automatique Accéder directement au contenu
Pré-Publication, Document De Travail Année : 2021

Graph integration of structured, semistructured and unstructured data for data journalism

Résumé

Digital data is a gold mine for modern journalism. However, datasets which interest journalists are extremely heterogeneous, ranging from highly structured (relational databases), semi-structured (JSON, XML, HTML), graphs (e.g., RDF), and text. Journalists (and other classes of users lacking advanced IT expertise, such as most non-governmental-organizations, or small public administrations) need to be able to make sense of such heterogeneous corpora, even if they lack the ability to define and deploy custom extract-transform-load workflows, especially for dynamically varying sets of data sources. We describe a complete approach for integrating dynamic sets of heterogeneous datasets along the lines described above: the challenges we faced to make such graphs useful, allow their integration to scale, and the solutions we proposed for these problems. Our approach is implemented within the ConnectionLens system; we validate it through a set of experiments.
Fichier principal
Vignette du fichier
INFOSYS-S-20-00785-3.pdf (1.99 Mo) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)

Dates et versions

hal-03150441 , version 1 (23-02-2021)
hal-03150441 , version 2 (08-09-2021)

Identifiants

  • HAL Id : hal-03150441 , version 1

Citer

Angelos Christos Anadiotis, Oana Balalau, Catarina Conceicao, Helena Galhardas, Mhd Yamen Haddad, et al.. Graph integration of structured, semistructured and unstructured data for data journalism. 2021. ⟨hal-03150441v1⟩
193 Consultations
1113 Téléchargements

Partager

Gmail Facebook X LinkedIn More