Sentiment Analysis on English and Greek Twitter Data towards vaccinations

δείτε την πρωτότυπη σελίδα τεκμηρίου
στον ιστότοπο του αποθετηρίου του φορέα για περισσότερες πληροφορίες και για να δείτε όλα τα ψηφιακά αρχεία του τεκμηρίου*



Sentiment Analysis on English and Greek Twitter Data towards vaccinations (EN)

Dontaki, Chrysoula (EN)

Tjortjis, Christos (EL)
Koukaras, Paraskevas (EL)
Tzafilkou, Aikaterini (EL)

masterThesis

2023-04-11T12:02:33Z
2023-04-11
2023-03-09


This dissertation focuses on Twitter sentiment analysis related to COVID-19 vaccines in English and Greek language. This dissertation was written as part of the MSc in Data Science at the International Hellenic University. The COVID-19 pandemic caused by the coronavirus SARS-CoV-2 originated in China in December 2019 [1]. The virus has infected and killed thousands of people according to the World Health Organization (WHO) has announced the COVID-19 outbreak as a pandemic that has hit the world [2]. An end to this pandemic can bring a worldwide vaccination campaign. However, vaccines have traditionally been met with public fear and hesitancy. During the lockdown imposed to many countries, people spent hours every day on social media platforms sharing their opinions and expressing their feelings. As a result, Twitter has become a valuable main resource for gathering information about people’s emotions towards SARS-CoV-2 vaccination. Extracting useful knowledge from naturally written texts is important for governments and health experts to understand people’s beliefs and establish effective campaign ideas, to increase vaccination acceptance. Therefore, the sentiment analysis process of classifying opinions towards vaccines like “positive”, “negative” or “neutral” can yield remarkable findings. To be more precise, the goal of this study is to classify people who are in favor or against vaccination, as well as people’s preferences for the three types of vaccines (Pfizer, Moderna, AstraZeneca) that are available today. Luckily, this task can be automated with the power of Machine Learning (ML) and Natural Language Processing (NLP). Twitter data have been retrieved in portions at different points of time during a period of seven months using Python programming language. After data preprocessing, the sentiment analysis was conducted using TextBlob, Valence Aware Dictionary and sEntiment Reasoner (VADER), AFINN and NRC tools. Graphical representation and performance analysis with state-of-the-art models (Logistic Regression, Decision Tree, Random Forest, XGBoost, and SVM Classifier) have been conducted on the tweets. Our results indicate that when using English ‘summer’ tweets from Twitter with TextBlob as a sentiment analysis tool, DT is the ML algorithm that gives the highest accuracy equal to 97.99% and F1-Score equal to 97.98%. In the autumn period, DT demonstrates again the best performance with an accuracy equal to 97.94%. The accuracy rate was slightly reduced to 0.05%. When examining the classification performance 5 results for the Greek language dataset, it is observed that the algorithms have the ability to distinguish better in the Greek language when a tweet has a positive, negative or neutral mood. DT was again the winner with 99.89% accuracy and 99.88% F1-Score. Regarding the autumn period, the performance of DT improved by 0.03% reaching 99.92% (EL)


COVID-19 (EL)
Corona virus (EL)
Pandemic (EL)
Twitter (EL)
Sentiment analysis (EN)

Αγγλική γλώσσα

School of Science and Technology, MSc in Data Science
IHU (EN)

Default License




*Η εύρυθμη και αδιάλειπτη λειτουργία των διαδικτυακών διευθύνσεων των συλλογών (ψηφιακό αρχείο, καρτέλα τεκμηρίου στο αποθετήριο) είναι αποκλειστική ευθύνη των αντίστοιχων Φορέων περιεχομένου.