Transparent and accessible audio processing: hybrid CNN-LSTM deep learning techniques for vocal separation in music

This item is provided by the institution :
University of Macedonia   

Repository :
Psepheda - Digital Library and Institutional Repository   

see the original item page
in the repository's web site and access all digital files if the item*



Διαφάνεια και προσβασιμότητα στην επεξεργασία ήχου: υβριδικές τεχνικές βαθιάς μάθησης CNN-LST για τον διαχωρισμό φωνής στη μουσική (EL)
Transparent and accessible audio processing: hybrid CNN-LSTM deep learning techniques for vocal separation in music (EL)

Vasileios Tsavalias (EN)

Eftychios Protopapadakis, Konstantinos Giannoutakis
Dimitrios Hristu-Varsakelis
Eftychios Protopapadakis (EN)

Bachelor's Degree Paper (EN)
Text (EN)

2024-11-25T09:06:51Z
2024


Η βιβλιοθήκη διαθέτει αντίτυπο της πτυχιακής μόνο σε ηλεκτρονική μορφή. (EL)
Πτυχιακή εργασία--Πανεπιστήμιο Μακεδονίας, Θεσσαλονίκη, 2024. (EL)
Submitted by ΒΑΣΙΛΕΙΟΣ-ΕΦΡΑΙΜ ΤΣΑΒΑΛΙΑΣ ([email protected]) on 2024-11-18T06:08:04Z No. of bitstreams: 3 license_rdf: 701 bytes, checksum: 42fd4ad1e89814f5e4a476b409eb708c (MD5) cnn_lstm_hybrid_vocal_seperation_thesis_vasilis_tsavalias.pdf: 2792135 bytes, checksum: c16da2c7e62d3e4a03da4563dbf367b9 (MD5) ics21083_vasilis_tsavalias_DissertationExaminations.docx: 71941 bytes, checksum: 93d222b69d06f7d18bc812aec1a0cd68 (MD5) (EN)
Approved for entry into archive by Κυριακή Μπαλτά ([email protected]) on 2024-11-25T09:06:51Z (GMT) No. of bitstreams: 3 license_rdf: 701 bytes, checksum: 42fd4ad1e89814f5e4a476b409eb708c (MD5) cnn_lstm_hybrid_vocal_seperation_thesis_vasilis_tsavalias.pdf: 2792135 bytes, checksum: c16da2c7e62d3e4a03da4563dbf367b9 (MD5) 01506 ΤΣΑΒΑΛΙΑΣ ΑΝΑΛΥΤΙΚΗ ΟΛΟΚΛΗΡΩΣΗΣ.pdf: 1062009 bytes, checksum: b1e78ea35ec006db14e363412ecf179d (MD5) (EN)
Rejected by Κυριακή Μπαλτά ([email protected]), reason: Να αναρτηθεί πρακτικό βαθμολόγησης ή αναλυτική βαθμολογία με το βαθμό της πτυχιακής εργασίας on 2024-10-24T06:43:10Z (GMT) (EN)
Rejected by Κυριακή Μπαλτά ([email protected]), reason: Να αναρτηθεί σκαναρισμένο το πρακτικό βαθμολόγησης ή το απόσμασμα αναλυτικής βαθμολογίας στο σημείο της πτυχιακής εργασίας on 2024-11-18T10:35:53Z (GMT) (EN)
Submitted by ΒΑΣΙΛΕΙΟΣ-ΕΦΡΑΙΜ ΤΣΑΒΑΛΙΑΣ ([email protected]) on 2024-11-20T17:16:44Z No. of bitstreams: 3 license_rdf: 701 bytes, checksum: 42fd4ad1e89814f5e4a476b409eb708c (MD5) cnn_lstm_hybrid_vocal_seperation_thesis_vasilis_tsavalias.pdf: 2792135 bytes, checksum: c16da2c7e62d3e4a03da4563dbf367b9 (MD5) 01506 ΤΣΑΒΑΛΙΑΣ ΑΝΑΛΥΤΙΚΗ ΟΛΟΚΛΗΡΩΣΗΣ.pdf: 1062009 bytes, checksum: b1e78ea35ec006db14e363412ecf179d (MD5) (EN)
Made available in DSpace on 2024-11-25T09:06:51Z (GMT). No. of bitstreams: 3 license_rdf: 701 bytes, checksum: 42fd4ad1e89814f5e4a476b409eb708c (MD5) cnn_lstm_hybrid_vocal_seperation_thesis_vasilis_tsavalias.pdf: 2792135 bytes, checksum: c16da2c7e62d3e4a03da4563dbf367b9 (MD5) 01506 ΤΣΑΒΑΛΙΑΣ ΑΝΑΛΥΤΙΚΗ ΟΛΟΚΛΗΡΩΣΗΣ.pdf: 1062009 bytes, checksum: b1e78ea35ec006db14e363412ecf179d (MD5) Previous issue date: 2024-09-19 (EN)
Submitted by ΒΑΣΙΛΕΙΟΣ-ΕΦΡΑΙΜ ΤΣΑΒΑΛΙΑΣ ([email protected]) on 2024-10-23T17:48:07Z No. of bitstreams: 2 license_rdf: 914 bytes, checksum: 24013099e9e6abb1575dc6ce0855efd5 (MD5) cnn_lstm_hybrid_vocal_seperation_thesis_vasilis_tsavalias.pdf: 2792135 bytes, checksum: c16da2c7e62d3e4a03da4563dbf367b9 (MD5) (EN)
Vocal separation from audio mixtures is a complex and critical task within audio signal processing, with significant applications in music remixing, track creation, and music information retrieval. This study presents a hybrid deep learning model combining Convolutional Neural Networks (CNNs) and Long Short-Term Memory (LSTM) networks, designed to effectively isolate vocals from intricate audio mixtures. The research leverages the MUSDB18 dataset, a recognized benchmark for music source separation. Extensive data preprocessing was employed, including the conversion of audio files into spectrograms via Short-Time Fourier Transform (STFT) and data augmentation techniques such as pitch shifting and time stretching to enhance the model’s generalization. The CNN-LSTM model was trained using the Adam optimizer with Mean Squared Error (MSE) as the loss function. Hyperparameter optimization was conducted using Optuna, systematically tuning critical parameters to maximize model performance. Post-processing techniques, including mask refinement and dynamic range compression, were applied to ensure quality vocal separations. To facilitate deployment in resource-constrained environments, dynamic quantization was applied using the Open Neural Network Exchange (ONNX) format. This approach reduced the model size and improved inference speed while maintaining accuracy. This research advances music source separation by providing an easy-tofollow pipeline for vocal isolation. The methodologies and results presented offer valuable insights for real-time audio processing applications, paving the way for broader adoption in various industries. (EN)


Deep learning (EN)
Music source separation (EN)
Convolutional Neural Networks (CNNs) (EN)
Vocal separation (EN)
Optuna (EN)
Long Short-Term Memory (LSTM) (EN)
Dynamic quantization (EN)
Spectograms (EN)

Πανεπιστήμιο Μακεδονίας (EL)

Τμήμα Εφαρμοσμένης Πληροφορικής (ΠΕ) (EL)

CC0 1.0 Παγκόσμια (EL)
http://creativecommons.org/publicdomain/zero/1.0/
All rights reserved. Reprinting, storing, and distributing for non-profit purposes, educational purposes, or research purposes is permitted, provided the source is cited and this message is retained. The views and conclusions contained in this document express the author’s opinions and should not be interpreted as representing the official positions of the University of Macedonia (EN)




*Institutions are responsible for keeping their URLs functional (digital file, item page in repository site)