LILLIE: Information extraction and database integration using linguistics and learning-based algorithms

LILLIE: Information extraction and database integration using linguistics and learning-based algorithms

URI: https://www.openarchives.gr/aggregator-openarchives/edm/dias/000058-100034
RDF/XML JSON-LD

This item is provided by the institution :
Technical University of Crete

Repository :
Institutional Repository Technical University of Crete

see the original item page
in the repository's web site and access all digital files if the item^*

Title

LILLIE: Information extraction and database integration using linguistics and learning-based algorithms (EN)

Creator

Παπαδοπουλος Δημητριος (EL)

Stockinger Kurt (EN)

Braschler, Martin (EN)

Papadopoulos Dimitrios (EN)

Smith Ellery (EN)

Type

journalArticle

Partial document
Publication in journal (EN)

Article
Scientific article (EN)

Issued

2022

Year

2022 (EN)

Description

Querying both structured and unstructured data via a single common query interface such as SQL or natural language has been a long standing research goal. Moreover, as methods for extracting information from unstructured data become ever more powerful, the desire to integrate the output of such extraction processes with “clean”, structured data grows. We are convinced that for successful integration into databases, such extracted information in the form of “triples” needs to be both (1) of high quality and (2) have the necessary generality to link up with varying forms of structured data. It is the combination of both these aspects, which heretofore have been usually treated in isolation, where our approach breaks new ground. The cornerstone of our work is a novel, generic method for extracting open information triples from unstructured text, using a combination of linguistics and learning-based extraction methods, thus uniquely balancing both precision and recall. Our system called LILLIE (LInked Linguistics and Learning-Based Information Extractor) uses dependency tree modification rules to refine triples from a high-recall learning-based engine, and combines them with syntactic triples from a high-precision engine to increase effectiveness. In addition, our system features several augmentations, which modify the generality and the degree of granularity of the output triples. Even though our focus is on addressing both quality and generality simultaneously, our new method substantially outperforms current state-of-the-art systems on the two widely-used CaRB and Re-OIE16 benchmark sets for information extraction. We have made our code publicly available1 to facilitate further research. (EN)

Subject

Data integration (EN)

Machine learning for database systems (EN)

Information extraction (EN)

Journal

Information Systems (EN)

Language

English

Publisher

Elsevier (EN)

School / Department / Institute

Πολυτεχνείο Κρήτης (EL)

Technical University of Crete (EN)

Provider

Technical University of Crete

Repository / collection

Institutional Repository Technical University of Crete

Subcollections

School of Electrical and Computer Engineering - Diploma Works

Distributed Multimedia Information Systems and Applications Laboratory - Diploma Works

*Institutions are responsible for keeping their URLs functional (digital file, item page in repository site)

LILLIE: Information extraction and database integration using linguistics and learning-based algorithms

Βοηθείστε μας να κάνουμε καλύτερο το OpenArchives.gr.