Η βιβλιοθήκη διαθέτει αντίτυπο της πτυχιακής μόνο σε ηλεκτρονική μορφή.
(EL)
Πτυχιακή εργασία--Πανεπιστήμιο Μακεδονίας, Θεσσαλονίκη, 2024.
(EL)
Submitted by ΒΑΣΙΛΕΙΟΣ-ΕΦΡΑΙΜ ΤΣΑΒΑΛΙΑΣ (
[email protected]) on 2024-11-18T06:08:04Z
No. of bitstreams: 3
license_rdf: 701 bytes, checksum: 42fd4ad1e89814f5e4a476b409eb708c (MD5)
cnn_lstm_hybrid_vocal_seperation_thesis_vasilis_tsavalias.pdf: 2792135 bytes, checksum: c16da2c7e62d3e4a03da4563dbf367b9 (MD5)
ics21083_vasilis_tsavalias_DissertationExaminations.docx: 71941 bytes, checksum: 93d222b69d06f7d18bc812aec1a0cd68 (MD5)
(EN)
Approved for entry into archive by Κυριακή Μπαλτά (
[email protected]) on 2024-11-25T09:06:51Z (GMT) No. of bitstreams: 3
license_rdf: 701 bytes, checksum: 42fd4ad1e89814f5e4a476b409eb708c (MD5)
cnn_lstm_hybrid_vocal_seperation_thesis_vasilis_tsavalias.pdf: 2792135 bytes, checksum: c16da2c7e62d3e4a03da4563dbf367b9 (MD5)
01506 ΤΣΑΒΑΛΙΑΣ ΑΝΑΛΥΤΙΚΗ ΟΛΟΚΛΗΡΩΣΗΣ.pdf: 1062009 bytes, checksum: b1e78ea35ec006db14e363412ecf179d (MD5)
(EN)
Rejected by Κυριακή Μπαλτά (
[email protected]), reason: Να αναρτηθεί πρακτικό βαθμολόγησης ή αναλυτική βαθμολογία με το βαθμό της πτυχιακής εργασίας on 2024-10-24T06:43:10Z (GMT)
(EN)
Rejected by Κυριακή Μπαλτά (
[email protected]), reason: Να αναρτηθεί σκαναρισμένο το πρακτικό βαθμολόγησης ή το απόσμασμα αναλυτικής βαθμολογίας στο σημείο της πτυχιακής εργασίας on 2024-11-18T10:35:53Z (GMT)
(EN)
Submitted by ΒΑΣΙΛΕΙΟΣ-ΕΦΡΑΙΜ ΤΣΑΒΑΛΙΑΣ (
[email protected]) on 2024-11-20T17:16:44Z
No. of bitstreams: 3
license_rdf: 701 bytes, checksum: 42fd4ad1e89814f5e4a476b409eb708c (MD5)
cnn_lstm_hybrid_vocal_seperation_thesis_vasilis_tsavalias.pdf: 2792135 bytes, checksum: c16da2c7e62d3e4a03da4563dbf367b9 (MD5)
01506 ΤΣΑΒΑΛΙΑΣ ΑΝΑΛΥΤΙΚΗ ΟΛΟΚΛΗΡΩΣΗΣ.pdf: 1062009 bytes, checksum: b1e78ea35ec006db14e363412ecf179d (MD5)
(EN)
Made available in DSpace on 2024-11-25T09:06:51Z (GMT). No. of bitstreams: 3
license_rdf: 701 bytes, checksum: 42fd4ad1e89814f5e4a476b409eb708c (MD5)
cnn_lstm_hybrid_vocal_seperation_thesis_vasilis_tsavalias.pdf: 2792135 bytes, checksum: c16da2c7e62d3e4a03da4563dbf367b9 (MD5)
01506 ΤΣΑΒΑΛΙΑΣ ΑΝΑΛΥΤΙΚΗ ΟΛΟΚΛΗΡΩΣΗΣ.pdf: 1062009 bytes, checksum: b1e78ea35ec006db14e363412ecf179d (MD5)
Previous issue date: 2024-09-19
(EN)
Submitted by ΒΑΣΙΛΕΙΟΣ-ΕΦΡΑΙΜ ΤΣΑΒΑΛΙΑΣ (
[email protected]) on 2024-10-23T17:48:07Z
No. of bitstreams: 2
license_rdf: 914 bytes, checksum: 24013099e9e6abb1575dc6ce0855efd5 (MD5)
cnn_lstm_hybrid_vocal_seperation_thesis_vasilis_tsavalias.pdf: 2792135 bytes, checksum: c16da2c7e62d3e4a03da4563dbf367b9 (MD5)
(EN)
Vocal separation from audio mixtures is a complex and critical task within
audio signal processing, with significant applications in music remixing, track
creation, and music information retrieval. This study presents a hybrid deep
learning model combining Convolutional Neural Networks (CNNs) and Long
Short-Term Memory (LSTM) networks, designed to effectively isolate vocals
from intricate audio mixtures.
The research leverages the MUSDB18 dataset, a recognized benchmark
for music source separation. Extensive data preprocessing was employed, including the conversion of audio files into spectrograms via Short-Time Fourier
Transform (STFT) and data augmentation techniques such as pitch shifting
and time stretching to enhance the model’s generalization.
The CNN-LSTM model was trained using the Adam optimizer with Mean
Squared Error (MSE) as the loss function. Hyperparameter optimization
was conducted using Optuna, systematically tuning critical parameters to
maximize model performance. Post-processing techniques, including mask
refinement and dynamic range compression, were applied to ensure quality
vocal separations.
To facilitate deployment in resource-constrained environments, dynamic
quantization was applied using the Open Neural Network Exchange (ONNX)
format. This approach reduced the model size and improved inference speed
while maintaining accuracy.
This research advances music source separation by providing an easy-tofollow pipeline for vocal isolation. The methodologies and results presented
offer valuable insights for real-time audio processing applications, paving the
way for broader adoption in various industries.
(EN)