Execution of Recursive Queries in Apache Spark

Execution of Recursive Queries in Apache Spark

URI: https://www.openarchives.gr/aggregator-openarchives/edm/elocus/000018-dlib_4_8_0_metadata-dlib-1491372475-223855-9689.tkl
RDF/XML JSON-LD

This item is provided by the institution :
University of Crete

Repository :
E-Locus Institutional Repository

see the original item page
in the repository's web site and access all digital files if the item^*

Title

Εκτέλεση αναδρομικών ερωτημάτων στο Apache Spark

Execution of Recursive Queries in Apache Spark

Creator

Κατσογριδάκης, Παύλος Σ.

Contributor

Φατούρου, Παναγιώτα

Πρατικάκης, Πολύβιος

Μπίλας, Άγγελος

Type

text

Τύπος Εργασίας--Μεταπτυχιακές εργασίες ειδίκευσης

Thesis
Master thesis (EN)

Date

2017-03-17

Year

2017 (EN)

Description

Τα περιβάλλοντα MapReduce επιτρέπουν την επεξεργασία τεράστιου όγκου δεδομένων με το να περιορίζουν το προγραμματιστικό μοντέλο σε τελεστές map και reduce. Αυτό το επίπεδο αφαίρεσης απλοποιεί πολλά δύσκολα προβλήματα που προκύπτουν στα κατανεμημένα συστήματα, όπως το συγχρονισμό και την ανοχή σε σφάλματα, και τα κρύβουν από τον προγραμματιστή. Παρ' όλα αυτά, υπάρχουν αλγόριθμοι οι οποίοι δεν μπορούν να εκφραστούν εύκολα σε MapReduce, όπως οι αναδρομικοί αλγόριθμοι. Στην εργασία αυτή επεκτείναμε το Apache Spark (ένα σύστημα χρόνου εκτέλεσης MapReduce), ώστε να υποστηρίζει αναδρομικούς αλγορίθμους. Οι αναδρομικοί αλγόριθμοι MapReduce δημιουργούν μεγάλο αριθμό εργασιών, οι οποίες δυσκολεύουν το πρόβλημα της χρονοδρομολόγησης. Γι αυτό εισάγουμε ένα νέο παράλληλο και πιο ελαφρύ αλγόριθμο χρονοδρομολόγησης. Ο αλγόριθμος αυτός είναι κατάλληλος για χρονοδρομολόγηση ενός μεγάλου αριθμού από εργασίες οι οποίες παίρνουν πολύ λίγο χρόνο. Υλοποιήσαμε τον παραπάνω αλγόριθμο και βρήκαμε ότι απλοποιεί την έκφραση αναδρομικών ερωτημάτων, και παράλληλα μπορεί να πετύχει μέχρι 2,5 φορές καλύτερο χρόνο από τον ήδη υπάρχων αλγόριθμο του Spark σε κάποια είδη εργασιών. (EL)

MapReduce environments offer great scalability by restricting the programming model to only map and reduce operators. This abstraction simplifies many difficult problems occurring in generic distributed computations like fault tolerance and synchronization, hiding them from the programmer. There are, however, algorithms that cannot be easily or efficiently expressed in MapReduce, such as recursive functions. In this work we extend the Apache Spark runtime so that it can support recursive queries. Those queries produce a very large number of tasks, making scheduling a difficult and time consuming problem. To tackle this problem we also introd uce a new parallel and more lightweight scheduling mechanism, ideal for scheduling a very large set of tiny tasks. We implemented the aforementioned scheduler and found that it simplifies the code for recursive computation and can perform up to 2.5 times faster than the default Spark scheduler for certain kinds of benchmarks. (EN)

Scientific field

Natural Sciences
Computer and Information Sciences (EN)

Language

English

School / Department / Institute

Univerisity of Crete ▶ School of Sciences & Engineering
Computer Science Department

Provider

University of Crete

Repository / collection

E-Locus Institutional Repository

Subcollections

Σχολή/Τμήμα--Σχολή Θετικών και Τεχνολογικών Επιστημών--Τμήμα Επιστήμης Υπολογιστών--Μεταπτυχιακές εργασίες ειδίκευσης

Elocus

*Institutions are responsible for keeping their URLs functional (digital file, item page in repository site)

Execution of Recursive Queries in Apache Spark

Βοηθείστε μας να κάνουμε καλύτερο το OpenArchives.gr.