This item is provided by the institution :

Repository :
IHU Repository
see the original item page
in the repository's web site and access all digital files if the item*

2017 (EN)
Plagiarism Detection in Text Collections (EN)

Kalampokis, Panagiotis (EN)

School of Science and Technology, MSc in Mobile and Web Computing (EL)
Papadopoulos, Apostolos (EN)
Gatzianas, Marios (EN)
Evangelidis, Georgios (EN)

The main purpose of this dissertation was to find an efficient way to compare a big corpus of document texts among them and check which of them have been subjected plagiarism. We conclude to the MinHash algorithm that is used most, for big data sets. The MinHash algorithm makes extensive use of Hashing functions so as to reduce the dimensionality space kept for the “useful” part of a document during the action of preprocessing, and estimates the probability, that two documents resemble each other with the LSH technique. (EN)


Plagiarism Detection (EN)
Big Data (EN)
Data Mining (EN)
MinHash (EN)

Διεθνές Πανεπιστήμιο της Ελλάδος (EL)
International Hellenic University (EN)



*Institutions are responsible for keeping their URLs functional (digital file, item page in repository site)