Αυτόματη αναγνώριση της κυπριακής διαλέκτου στα μέσα κοινωνικής δικτύωσης

Αυτόματη αναγνώριση της κυπριακής διαλέκτου στα μέσα κοινωνικής δικτύωσης

URI: https://www.openarchives.gr/aggregator-openarchives/edm/pergamos/000005-uoadl%3A2918569
RDF/XML JSON-LD

This item is provided by the institution :
/aggregator-openarchives/portal/institutions/uoa

Repository :
Pergamos Digital Library

see the original item page
in the repository's web site and access all digital files if the item^*

Title

Αυτόματη αναγνώριση της κυπριακής διαλέκτου στα μέσα κοινωνικής δικτύωσης

Creator

Λιάκου Κωνσταντίνα (EL)

Liakou Konstantina (EN)

Type

born_digital_postgraduate_thesis

Διπλωματική Εργασία (EL)

Postgraduate Thesis (EN)

Thesis
Master thesis (EN)

Date

2020

Year

2020 (EN)

Description

Στην παρούσα διπλωματική εργασία ερευνάται το ζήτημα της αυτόματης αναγνώρισης της Κυπριακής διαλέκτου στα μέσα κοινωνικής δικτύωσης και πιο συγκεκριμένα σε σχόλια του Facebook. Επιχειρήθηκε η δημιουργία ενός υπολογιστικού μοντέλου αναγνώρισης διαλέκτου και για τη διαδικασία ταξινόμησης των κειμένων σε δύο γλωσσικές κατηγορίες, την Κυπριακή διάλεκτο και την Κοινή Νέα Ελληνική, χρησιμοποιήθηκαν τρεις αλγόριθμοι: ο Multinomial Naïve Bayes, η Μηχανή Διανυσμάτων Υποστήριξης (Linear Support Vector Machine), και η Λογιστική Παλινδρόμηση. Τα δεδομένα για την κατηγοριοποίηση αποτελούνται από ένα σώμα κειμένων στα οποία γίνεται χρήση της Κυπριακής διαλέκτου και ένα σώμα κειμένων στην Κοινή Νέα Ελληνική, τα οποία προέρχονται από σχόλια αναρτημένα στο Facebook. Τα δεδομένα χωρίστηκαν σε σύνολο εκπαίδευσης (training set) και σύνολο δοκιμής (test set). Έπειτα, τα αλγοριθμικά μοντέλα εκπαιδεύτηκαν με την εισαγωγή του συνόλου εκπαίδευσης. Η τελική πρόβλεψη πραγματοποιήθηκε με τη χρήση του συνόλου δοκιμής. Τα αποτελέσματα ανέδειξαν τον αλγόριθμο Multinomial Naïve Bayes ως τον ταξινομητή με την καλύτερη επίδοση, καθώς στη διαδικασία αναγνώρισης της Κυπριακής διαλέκτου στα κείμενα με βάση τις συχνότερες λέξεις και τα συχνότερα διγράμματα, κατόρθωσε ακρίβεια 88,87%. Αποδείχτηκε έτσι, ότι θα ήταν εφικτή η δημιουργία ενός συστήματος αυτόματης αναγνώρισης της Κυπριακής διαλέκτου και διάκρισής της από την Κοινή Νέα Ελληνική. (EL)

This Master's thesis investigates the subject of the automatic identification of the Cypriot dialect in social media and, more specifically, in Facebook comments. The creation of a computational model was undertaken. In order to classify the texts in two language categories, the Cypriot dialect and the modern Greek language, three algorithms were used: Multinomial Naïve Bayes, Linear Support Vector Machine, and Logistic Regression. The data for the classification comprise of a body of texts in which there is use of the Cypriot dialect and a body of texts in the Modern Greek language, both of which come from comments posted on Facebook. The data were divided in training set and test set. Then, the algorithmic models were trained with the input of the training set. The final prediction was carried out with the use of the test set. The results highlighted the Multinomial Naïve Bayes algorithm as the classifier with the best performance, as during the process of identifying the Cypriot dialect in the texts, based on the most frequent keywords and bi-grams, it achieved an accuracy of 88.87%. It was thus proven that the creation of a system for automatic identification of the Cypriot dialect and its distinction from modern Greek would be feasible. (EN)

Scientific field

Γλώσσα – Λογοτεχνία

Humanities and the Arts ▶ Languages and literature
Language and Linguistics (EN)

Subject

Γλώσσα – Λογοτεχνία (EL)

Language – Literature (EN)

Language

Greek

School / Department / Institute

Σχολή Φιλοσοφική » Τμήμα Φιλολογίας » ΠΜΣ Κοραής (Νεοελληνική Φιλολογία) » Κατεύθυνση Γλωσσολογία

Βιβλιοθήκη και Κέντρο Πληροφόρησης » Βιβλιοθήκη Φιλοσοφικής Σχολής

National and Kapodistrian University of Athens ▶ School of Philosophy
Department of Philology

Rights

https://creativecommons.org/licenses/by-nc/4.0/

Provider

University of Athens

Repository / collection

Pergamos Digital Library

Subcollections

Postgraduate Thesis

*Institutions are responsible for keeping their URLs functional (digital file, item page in repository site)

Αυτόματη αναγνώριση της κυπριακής διαλέκτου στα μέσα κοινωνικής δικτύωσης

Βοηθείστε μας να κάνουμε καλύτερο το OpenArchives.gr.