Prediction of the retention time of natural product metabolites using transfer learning strategies

This item is provided by the institution :
University of West Attica   

Repository :
Institutional Repository Polynoe   

see the original item page
in the repository's web site and access all digital files if the item*



Prediction of the retention time of natural product metabolites using transfer learning strategies

Κατσάρα, Βασιλική

Athanasiadis, Emmanouil
Σχολή Μηχανικών
Matsoukas, Minos-Timotheos
Τμήμα Μηχανικών Βιοϊατρικής
Kostopoulos, Spiros

Διπλωματική εργασία

2024-10-11

2024-10-22T06:07:39Z


Retention time (RT) prediction in chromatography can play an important role for numerous analytical applications, including drug discovery and environmental monitoring. This study aims to enhance RT prediction accuracy by employing deep learning techniques, particularly focusing on transfer learning to adapt models trained on synthetic compounds acquired by High Pressure Liquid chromatography–Mass Spectrometry (HPLC-MS) to predict RTs for natural products in different chromatographic methods. We utilized the extensive METLIN Small Molecule Retention Time (SMRT) dataset, comprising over 80,000 synthetic compounds, to train a deep neural network (DNN). This model was then fine-tuned on smaller datasets of natural products, from the RepoRT database using a two-stage transfer learning approach. Initially, the DNN’s upper layers were frozen to retain knowledge about high level features while training on the new data. Subsequently, all layers were unfrozen for further training with a reduced learning rate, ensuring both general and unique patterns were captured. Hyperparameter optimization was conducted using Optuna, leveraging a 5-fold nested cross-validation to ensure robust performance. The evaluation metrics that we computed were the Mean Absolute Error (MAE), the Median Absolute Error (MedAE) and Mean Absolute Percentage error (MAPE). Transfer learning was then compared with new trained DNNs directly trained on the RepoRT database and showed that the strategy was successful according to the MAE and MadAE metric, although not according to the MAPE. We decided to remove the outliers and noticed that with the cleared data transfer learning performed better considering all the metrics. In the future, it will be necessary to refine this strategy to improve its performance, either by testing it on the same datasets or by incorporating additional data.


Metabolomics
Molecular descriptors
Transfer learning
Molecular fingerprints
Machine learning
Retention time

English

Πανεπιστήμιο Δυτικής Αττικής

ΣΧΟΛΗ ΜΗΧΑΝΙΚΩΝ - Τμήμα Μηχανικών Βιοϊατρικής - Διπλωματικές εργασίες

Αναφορά Δημιουργού - Μη Εμπορική Χρήση - Παρόμοια Διανομή 4.0 Διεθνές
http://creativecommons.org/licenses/by-nc-nd/4.0/
Attribution-NonCommercial-NoDerivatives 4.0 Διεθνές




*Institutions are responsible for keeping their URLs functional (digital file, item page in repository site)