Assessment and comparison of existing methods and datasets for sentiment analysis of Greek texts

Assessment and comparison of existing methods and datasets for sentiment analysis of Greek texts

URI: https://www.openarchives.gr/aggregator-openarchives/edm/polynoe/000125-11400_2411
RDF/XML JSON-LD

This item is provided by the institution :
University of West Attica

Repository :
Institutional Repository Polynoe

see the original item page
in the repository's web site and access all digital files if the item^*

Title

Assessment and comparison of existing methods and datasets for sentiment analysis of Greek texts

Creator

Φραγκής, Νικόλαος

Contributor

Tselenti, Panagiota

Μαστοροκώστας, Πάρις

Σχολή Μηχανικών

Τεχνητή Νοημοσύνη και Οπτική Υπολογιστική

Kesidis, Anastasios

Τμήμα Μηχανικών Τοπογραφίας και Γεωπληροφορικής

Τμήμα Μηχανικών Πληροφορικής και Υπολογιστών

Type

Μεταπτυχιακή διπλωματική εργασία

Thesis
Master thesis (EN)

Issued

2022-06-30

Created

2022-07-07T12:17:21Z

Year

2022 (EN)

Description

Sentiment Αnalysis is a well-known field of Natural Language Processing that is concerned with text classification. There is a vast number of papers, especially for the English language, that present state-of-the-art results on many different datasets using a variety of classification models. The aim of this work is to compare machine learning models on different datasets in both Greek and English. In order to achieve this aim, we used the well-known IMDb dataset from Stanford University, which is very often used for the evaluation of new text classification models, and one equivalent new dataset that we created in Greek from the Athinorama website. For our experiments, we used the following models: Logistic Regression, Support Vector Machine, Naïve Bayes, Decision Trees, XGBoost, Convolutional Neural Network, Long Short-Term Memory, Gated Recurrent Units, and Bidirectional Encoder Representations from Transformers (BERT). The first five models were combined with the TF-IDF vectorization technique, while the rest were combined with the Word Embeddings vectorization technique. The results show that the best classifier for sentiment analysis for both English and Greek is the pretrained BERT model. The difference in language does not seem to have a significant impact on the results, whereas the quality, the size, and the level of pre-processing of the data appear to play a significant role in the classification process. The reason we chose to deal with this work is the lack of research for the Greek language and our contribution is the Athinorama Light dataset that could play a significant role in future works for Greek language classification issues.

Scientific field

Natural Sciences
Computer and Information Sciences (EN)

Subject

Sentiment analysis

Machine learning

Language

English

Publisher

Université de Limoges

Πανεπιστήμιο Δυτικής Αττικής

School / Department / Institute

ΣΧΟΛΗ ΜΗΧΑΝΙΚΩΝ - Τμήμα Μηχανικών Πληροφορικής και Υπολογιστών - Μεταπτυχιακές διπλωματικές εργασίες - Τεχνητή Νοημοσύνη και Οπτική Υπολογιστική

University of West Attica ▶ School of Social Engineering
Department of Informatics and Computer Engineering

Rights

Αναφορά Δημιουργού - Μη Εμπορική Χρήση - Παρόμοια Διανομή 4.0 Διεθνές

Provider

University of West Attica

Repository / collection

Institutional Repository Polynoe

Subcollections

ΣΧΟΛΗ ΜΗΧΑΝΙΚΩΝ

Τμήμα Μηχανικών Πληροφορικής και Υπολογιστών

Μεταπτυχιακές διπλωματικές εργασίες - Τεχνητή Νοημοσύνη και Οπτική Υπολογιστική

*Institutions are responsible for keeping their URLs functional (digital file, item page in repository site)

Assessment and comparison of existing methods and datasets for sentiment analysis of Greek texts

Βοηθείστε μας να κάνουμε καλύτερο το OpenArchives.gr.