内容 |
Metabolites are the end products of cellular processes within an organism and reflect its direct physiological state. Identifying new metabolites and their corresponding synthesis genes from normal and pathological tissues, or from plant samples subjected to environmental stress, can aid in disease diagnosis and plant trait improvement. This seminar will cover two subtopics. The first will introduce the application of comparative genome analysis in metabolome research to identify neo-functionalized O-methyltransferase genes in the legume family. The second will discuss the use of machine learning techniques in building informatics foundation for the meta-analysis of metabolomics. In metabolomics analysis, the retention times of mass spectral data from different measurement systems are not directly comparable, limiting the ability to discover biomarkers across datasets. This study aims to develop a methodology for integrating metabolomics datasets obtained from diverse measurement systems. Specifically, we utilize compound descriptors, retention time, and meta-information of measurement systems as input features, building a conversion model that adjusts retention times based on measurement system information. Seven machine learning algorithms, including two linear models, three decision tree models, and two forms of SVM (linear and non-linear), were chosen for training. Metric evaluation on validation datasets shows the superiority of tree models. |