Pre-training for video action recognition with automatically generated datasets

δείτε την πρωτότυπη σελίδα τεκμηρίου
στον ιστότοπο του αποθετηρίου του φορέα για περισσότερες πληροφορίες και για να δείτε όλα τα ψηφιακά αρχεία του τεκμηρίου*



Pre-training for video action recognition with automatically generated datasets (EN)

Σβέζεντσεβ, Νταβίντ (EL)
Svezentsev, Ntavint (EN)

ntua (EL)
Rontogiannis, Athanasios (EN)
Potamianos, Gerasimos (EN)
Maragos, Petros (EN)

bachelorThesis

2023-12-06T08:06:56Z
2023-07-12


In recent years, the computer vision community has exhibited growing interest in synthetic data. For the image modality, existing work has proposed learning visual representations by pre-training with synthetic samples produced by various generative processes instead of real data. Such an approach is advantageous as it resolves issues associated with real data: collection and labeling costs, copyright, privacy and human bias. Desirable properties of synthetic images have been carefully investigated and as a result the gap in performance between real and synthetic images has been alleviated significantly. The present work extends the aforementioned approach to the domain of video and applies it to the task of action recognition. Due to the addition of the temporal dimension, this modality is notably more complex than images. As such, employing fractal geometry and other generative processes, we present methods to automatically produce large-scale datasets of short synthetic video clips. This approach is applicable for both supervised and self-supervised learning. To narrow the domain gap, we manually observe real video samples and identify their key properties such as periodic motion, random background, camera displacement etc. These properties are then carefully emulated during pre-training. Through thorough ablations, we determine the properties that strengthen downstream results and offer general guidelines for pre-training with synthetic videos. The proposed approach is evaluated on small-scale action recognition datasets HMDB51 and UCF101 as well as four other video benchmarks. Compared to standard Kinetics pretraining, our reported results come close and are even superior on a portion of benchmarks. (EN)


Μηχανική Μάθηση (EL)
Αναγνώριση Δράσης (EL)
Γεωμετρία Φράκταλ (EL)
Συνθετικά Δεδομένα (EL)
Όραση Υπολογιστών (EL)
Action Recognition (EN)
Computer Vision (EN)
Deep Learning (EN)
Synthetic Data (EN)
Fractal Geometry (EN)

Αγγλική γλώσσα

Computer Vision, Speech Communication and Signal Processing Group (EL)
Εθνικό Μετσόβιο Πολυτεχνείο. Σχολή Ηλεκτρολόγων Μηχανικών και Μηχανικών Υπολογιστών. Τομέας Σημάτων, Ελέγχου και Ρομποτικής (EL)

Αναφορά Δημιουργού 3.0 Ελλάδα
http://creativecommons.org/licenses/by/3.0/gr/




*Η εύρυθμη και αδιάλειπτη λειτουργία των διαδικτυακών διευθύνσεων των συλλογών (ψηφιακό αρχείο, καρτέλα τεκμηρίου στο αποθετήριο) είναι αποκλειστική ευθύνη των αντίστοιχων Φορέων περιεχομένου.