Extracting Structured Information from Greek Legislation Data

δείτε την πρωτότυπη σελίδα τεκμηρίου
στον ιστότοπο του αποθετηρίου του φορέα για περισσότερες πληροφορίες και για να δείτε όλα τα ψηφιακά αρχεία του τεκμηρίου*



Extracting Structured Information from Greek Legislation Data (EN)

Michailidis, Alexios (EL)

Dr. Berberidis, Christos (EL)
Assist. Prof. Peristeras, Vassilios (EN)

masterThesis

2023-04-05T09:09:45Z
2023-04-05
2023-03-09


Customers nowadays have difficulties in finding the relevant information due to information overload, mainly via the usage of the World Wide Web. The amount of data kept on the Internet has grown at an exponential rate in recent years. Furthermore, most information is released in an unstructured format, making it difficult to efficiently extract knowledge. But there is a huge demand for transparency, especially in the public sector, thus it is necessary to extract structured information. Except for the transparency issue, structured data may be examined further to get new information and insights. Although machine learning and NLP approaches, such as named-entity recognition and relation extraction, have recently demonstrated interesting outcomes in the research field of information extraction, the majority of current research has been devoted to English language material. The purpose of this study is to extract structured information from Greek legislation texts, which are published in PDF format with no metadata, deviating a lot from Tim BernersLee's idea for linked open data. To be more detailed, we will first create a named entity recognition model for extracting the entities from the documents. Following that, we will fine-tune a transformers-based model to detect the relationships between these entities. Finally, we will combine them into a pipeline and extract structured information from plain text input. Concluding, the present research contributes to studies that have already examined information extraction tasks in general, by proposing a particular approach that relies on transformer-based models for deriving entity relationships from text and providing structured information that can be further studied. Especially, to the best of our knowledge, the presented approach is the first initiative to extract relationships from Greek legal documents and might contribute to the country's desired Open Government agenda. (EN)


Extracting structured information (EL)
Greek legislation data (EL)

Αγγλική γλώσσα

School of Science and Technology, MSc in Data Science
IHU (EN)

Default License




*Η εύρυθμη και αδιάλειπτη λειτουργία των διαδικτυακών διευθύνσεων των συλλογών (ψηφιακό αρχείο, καρτέλα τεκμηρίου στο αποθετήριο) είναι αποκλειστική ευθύνη των αντίστοιχων Φορέων περιεχομένου.