Αποτελεσματικοί αλγόριθμοι και βελτιωμένες τεχνικές διαχείρισης μεγάλου όγκου δεδομένων και εφαρμογές τους στο διάχυτο υπολογισμό

Αποτελεσματικοί αλγόριθμοι και βελτιωμένες τεχνικές διαχείρισης μεγάλου όγκου δεδομένων και εφαρμογές τους στο διάχυτο υπολογισμό

URI: https://www.openarchives.gr/aggregator-openarchives/edm/phdtheses/000040-10442_55392
RDF/XML JSON-LD

This item is provided by the institution :
National Documentation Centre (EKT)

Repository :
National Archive of PhD Theses | ΕΚΤ NA.Ph.D.

see the original item page
in the repository's web site and access all digital files if the item^*

Title

Efficient algorithms and novel big data management techniques and their applications in ubiquitous computing

Αποτελεσματικοί αλγόριθμοι και βελτιωμένες τεχνικές διαχείρισης μεγάλου όγκου δεδομένων και εφαρμογές τους στο διάχυτο υπολογισμό

Creator

Vonitsanos, Gerasimos

Βονιτσάνος, Γεράσιμος

Type

PhD Thesis

Thesis
PhD thesis (EN)

Date

2022

Year

2022 (EN)

Description

Η εξόρυξη δεδομένων έχει σημαντικό ρόλο στο πεδίο της πληροφορικής και της ανάλυσης δεδομένων. Με την ολοένα και αυξανόμενη παραγωγή και αποθήκευση δεδομένων από διάφορες πηγές, όπως βάσεις δεδομένων, κινητές συσκευές, κοινωνικά μέσα και αισθητήρες, η ανάγκη για αποτελεσματικά εργαλεία που μπορούν να εξάγουν γνώση και μοτίβα από αυτά τα δεδομένα έχει καταστεί πιο επιτακτική από ποτέ. Η διαδικασία της εξόρυξης δεδομένων αποσκοπεί στην ανακάλυψη συνδέσεων, προτύπων και πληροφοριών από μεγάλα, πολύπλοκα και δομημένα, ή μη, σύνολα δεδομένων. Πιο συγκεκριμένα, επιτρέπει την ανίχνευση τάσεων, βασικών χαρακτηριστικών, αλληλεπιδράσεων και συσχετίσεων. Η εφαρμογή της εξόρυξης δεδομένων καλύπτει πολλούς κλάδους, όπως επιχειρηματική ανάλυση, βιοπληροφορική, χρηματοοικονομική πρόβλεψη, κοινωνική ανάλυση, αναγνώριση προτύπων, ανάλυση συναισθήματος κ.ά. Στον τομέα της τεχνικής, οι αλγόριθμοι εξόρυξης δεδομένων περιλαμβάνουν αλγορίθμους συσταδοποίησης, ταξινόμησης, παραγωγής κανόνων, ανίχνευσης ανωμαλιών κ.ά. Επίσης, χρησιμοποιούνται εργαλεία ανάλυσης δεδομένων και προγράμματα οπτικοποίησης για την παρουσίαση αναλυτικών εικόνων και πληροφοριών. Η διατριβή επικεντρώνεται στην εφαρμογή της εξόρυξης δεδομένων σε διάφορους κλάδους. Περιλαμβάνει μελέτη τεχνικών, εφαρμογή του Apache Spark για την αντιμετώπιση προκλήσεων με την επεξεργασία ετερογενών και ημι-δομημένων δεδομένων, και αναλύει πώς η χρήση πλατφορμών, όπως το Apache Spark μπορεί να χρησιμοποιηθεί αποτελεσματικά για τη μοντελοποίηση και διαχείριση δεδομένων. Επίσης, εστιάζεται η εφαρμογή του Apache Spark στο πλαίσιο της εξόρυξης γνώμης για τη διαχείριση πολιτισμικών δεδομένων. Η κύρια προσέγγιση είναι η αξιοποίηση της ανάλυσης μεγάλων συνόλων δεδομένων με τη χρήση του εργαλείου Spark streaming για την ανάκτηση πολύτιμων πληροφοριών και απόψεων σε πραγματικό χρόνο.

The thesis explores the crucial role of data mining in computer science and data analysis, particularly in the context of the 4th Industrial Revolution. With the exponential increase in data generation from various sources like databases, mobile devices, and social media, the demand for effective data mining tools has become imperative. Data mining aims to unveil patterns and insights from large, complex, and structured datasets, enabling the identification of often hidden trends and interactions. This thesis covers diverse domains where data mining is applied, ranging from business analysis and bioinformatics to financial forecasting and sentiment analysis. It explores into clustering, classification, and anomaly detection algorithms, harnessing data analysis tools and visualization techniques for presenting findings. One key focus of the research is the application of data mining techniques using Apache Spark, specifically addressing challenges posed by heterogeneous and semi-structured data. The architecture of Apache Spark is leveraged for data management and analysis. Real-time information retrieval from cultural content is emphasized through extensive dataset analysis, leading to customized content for users and improved engagement. The adoption of Apache Spark ensures efficient processing and analysis of massive data volumes, utilizing its streaming architecture for managing data streams. The study validates the proposed approach with Twitter data, employing Apache Spark streaming for real-time cultural content analysis. The thesis further explores Collaborative Filtering (CF) technique for recommendation systems, extending its application to higher-order systems using GeoSpark. This technique enhances understanding of user behavior by gathering inputs from varying distances. Another significant aspect of the research involves the utilization of GeoSpark for managing and analyzing spatiotemporal data. By employing methods like Decision Trees and Random Forests, the study aims to extract insights from spatiotemporal data while focusing on privacy management. The thesis also investigates preprocessing of documents for analysis, utilizing the Term Frequency Inverse Document Frequency (TF-IDF) approach to create representative vectors. Furthermore, it presents predictive modeling for stock movements and explores the integration of emotional information from Twitter using Apache Spark. Chapters explore into diverse applications like community detection algorithms, protein structure prediction, and genetic variations analysis. The application of data mining in movie recommendations and understanding cryptocurrency sentiment through Twitter data is also discussed.

Scientific field

Επιστήμες Μηχανικού και Τεχνολογία ➨ Επιστήμη Ηλεκτρολόγου Μηχανικού, Ηλεκτρονικού Μηχανικού, Μηχανικού Η/Υ ➨ Τεχνολογία μέσων

Engineering and Technology ▶ Electrical engineering, Electronic engineering, Information engineering
Media Technology (EN)

Natural Sciences
Computer and Information Sciences (EN)

Subject

Δομές δεδομένων

Επιστήμη Ηλεκτρολόγου Μηχανικού, Ηλεκτρονικού Μηχανικού, Μηχανικού Η/Υ

Electrical Engineering, Electronic Engineering, Information Engineering

Data structures

Τεχνολογία μέσων

Επιστήμες Μηχανικού και Τεχνολογία

Engineering and Technology

Algorithms

Μεγάλα δεδομένα

Αλγόριθμοι

Μηχανική μάθηση

Machine learning

Media Technology

Big data

Language

English

Publisher

Πανεπιστήμιο Πατρών

University of Patras

School / Department / Institute

Πανεπιστήμιο Πατρών. Σχολή Πολυτεχνική. Τμήμα Μηχανικών Ηλεκτρονικών Υπολογιστών και Πληροφορικής

University of Patras ▶ School of Engineering
Department of Computer Engineering and Informatics

Provider

National Documentation Centre (EKT)

Repository / collection

National Archive of PhD Theses

Subcollections

Συλλογή ΕΑΔΔ

*Institutions are responsible for keeping their URLs functional (digital file, item page in repository site)

Αποτελεσματικοί αλγόριθμοι και βελτιωμένες τεχνικές διαχείρισης μεγάλου όγκου δεδομένων και εφαρμογές τους στο διάχυτο υπολογισμό

Βοηθείστε μας να κάνουμε καλύτερο το OpenArchives.gr.