Explore and assess how the granularity of the representation of a molecule affects the accuracy of mutagenicity prediction with Machine Learning.
Nicolas K. Shinada, Naoki Koyama, Megumi Ikemori, Tomoki Nishioka, Seiji Hitaoka, Atsushi Hakura, Shoji Asakura, Yukiko Matsuoka and Sucheendra K. Palaniappan.
Find Out MorePrediction of molecular properties using machine-learning approach relies traditionally on a description of the structure of the molecules. However, this description has severe limitations in capturing the nuance of the structural diversity. We propose a combinatorial approach, that relies on a aggregation of molecular features, including the structure to improve machine-learning based model. In this project, in collaboration with Eisai Co. Ltd., we focused in the prediction of mutagenicity outcome to assist scientists in their quest for developement of novel and safe drugs.
Using topological description (ECFP), pre-defined substructures (MACCS) and learned representations (Mol2vec and GCN) to encompass a wide array of the molecular structure representation.
Leveraging calculated 1D and 2D molecular properties such as LogP, topological indices, atomic and ring counts, atom-pairs indices using the Python package Mordred.
Using a curated list of genotoxic, non-genotoxic and electrophilicity-related structural alerts from ToxAlerts webserver.
Density Functional Theory properties such as the total electronic energy, HOMO and LUMO levels calculated using the Python Psi4 package.
The source code and data sets to generate the results present in this study are available on Bitbucket.
Source code(1) Nicolas K. Shinada, Naoki Koyama, Megumi Ikemori, Tomoki Nishioka, Seiji Hitaoka, Atsushi Hakura, Shoji Asakura, Yukiko Matsuoka, Sucheendra K. Palaniappan. Optimizing machine learning models for mutagenicity prediction through better feature selection. DOI: 10.1093/mutage/geac010
(2) Nicolas K. Shinada, Naoki Koyama, Megumi Ikemori, Tomoki Nishioka, Seiji Hitaoka, Atsushi Hakura, Shoji Asakura, Yukiko Matsuoka, Sucheendra K. Palaniappan. SaferWorldbyDesign Webinar: in silico Mutagenicity Prediction: a Journey. SaferWorldbyDesign Webinar 2022
(3) Naoki Koyama, Megumi Ikemori, Tomoki Nishioka, Seiji Hitaoka, Atsushi Hakura, Chihiro Nakazawa, Chakravarti K. Suman, Saiakhov D. Roustem, Nicolas K. Shinada, Sucheendra K. Palaniappan, Yukiko Matsuoka, Shoji Asakura. Development of AI mutagenicity prediction system incorporating the knowledge of expert review. 日本毒性学会学術年会, 48.1, S19-4, 2021 (JSOT 2021)
(4) 小山 直己, 羽倉 昌志, 倉上 真樹, 西岡 大貴, 比多岡 清司, Nicolas K. Shinada, Sucheendra K. Palaniappan, 松岡 由希子, Suman K. Chakravarti, Roustem D. Saiakhov, 朝倉 省二. In silico (AI, (Q)SAR) を活用した変異原性スクリーニング評価 -Industry の活用事例 -. 日本環境変異原ゲノム学会第50回記念大会 (JEMS 2021)
(5) Nicolas K. Shinada. Advancements in text mining and deep learning to improve toxicity prediction. OpenTox 2020
Part of this study was conducted as a result of a similar project for mutagenicity prediction that was funded by Eisai Ltd. Co. We acknowledge the funding support for the project which eventually led to some of the ideas presented in this paper.
Please feel free to email us in case you have enquiries about the study or the source code.
SBX Corporation and The systems biology insttitute