Plagiarism Detection in Text Collections


This item is provided by the institution :
International Hellenic University
Repository :
IHU Repository
see item page
in the web site of the repository *

Semantic enrichment/homogenization by EKT

2017 (EN)
Plagiarism Detection in Text Collections (EN)

Kalampokis, Panagiotis (EN)

School of Science and Technology, MSc in Mobile and Web Computing (EL)
Papadopoulos, Apostolos (EN)
Gatzianas, Marios (EN)
Evangelidis, Georgios (EN)

The main purpose of this dissertation was to find an efficient way to compare a big corpus of document texts among them and check which of them have been subjected plagiarism. We conclude to the MinHash algorithm that is used most, for big data sets. The MinHash algorithm makes extensive use of Hashing functions so as to reduce the dimensionality space kept for the “useful” part of a document during the action of preprocessing, and estimates the probability, that two documents resemble each other with the LSH technique. (EN)


Plagiarism Detection (EN)
Big Data (EN)
Data Mining (EN)
MinHash (EN)

Διεθνές Πανεπιστήμιο της Ελλάδος (EL)
International Hellenic University (EN)



*Institutions are responsible for keeping their URLs functional (digital file, item page in repository site)