Optimizing machine learning models for mutagenicity prediction through better feature selection

The Motivation

Prediction of molecular properties using machine-learning approach relies traditionally on a description of the structure of the molecules. However, this description has severe limitations in capturing the nuance of the structural diversity. We propose a combinatorial approach, that relies on a aggregation of molecular features, including the structure to improve machine-learning based model. In this project, in collaboration with Eisai Co. Ltd., we focused in the prediction of mutagenicity outcome to assist scientists in their quest for developement of novel and safe drugs.

Molecular Features

Molecular Structure

Using topological description (ECFP), pre-defined substructures (MACCS) and learned representations (Mol2vec and GCN) to encompass a wide array of the molecular structure representation.

Calculated Properties

Leveraging calculated 1D and 2D molecular properties such as LogP, topological indices, atomic and ring counts, atom-pairs indices using the Python package Mordred.

Mutagenicity Alerts

Using a curated list of genotoxic, non-genotoxic and electrophilicity-related structural alerts from ToxAlerts webserver.

DFT Descriptors

Density Functional Theory properties such as the total electronic energy, HOMO and LUMO levels calculated using the Python Psi4 package.

Related Publications

(1) Nicolas K. Shinada, Naoki Koyama, Megumi Ikemori, Tomoki Nishioka, Seiji Hitaoka, Atsushi Hakura, Shoji Asakura, Yukiko Matsuoka, Sucheendra K. Palaniappan. Optimizing machine learning models for mutagenicity prediction through better feature selection. DOI: 10.1093/mutage/geac010

(2) Nicolas K. Shinada, Naoki Koyama, Megumi Ikemori, Tomoki Nishioka, Seiji Hitaoka, Atsushi Hakura, Shoji Asakura, Yukiko Matsuoka, Sucheendra K. Palaniappan. SaferWorldbyDesign Webinar: in silico Mutagenicity Prediction: a Journey. SaferWorldbyDesign Webinar 2022

(3) Naoki Koyama, Megumi Ikemori, Tomoki Nishioka, Seiji Hitaoka, Atsushi Hakura, Chihiro Nakazawa, Chakravarti K. Suman, Saiakhov D. Roustem, Nicolas K. Shinada, Sucheendra K. Palaniappan, Yukiko Matsuoka, Shoji Asakura. Development of AI mutagenicity prediction system incorporating the knowledge of expert review. 日本毒性学会学術年会, 48.1, S19-4, 2021 (JSOT 2021)

(4) 小山直己, 羽倉昌志, 倉上真樹, 西岡大貴, 比多岡清司, Nicolas K. Shinada, Sucheendra K. Palaniappan, 松岡由希子, Suman K. Chakravarti, Roustem D. Saiakhov, 朝倉省二. In silico (AI, (Q)SAR) を活用した変異原性スクリーニング評価 -Industry の活用事例 -. 日本環境変異原ゲノム学会第50回記念大会 (JEMS 2021)

(5) Nicolas K. Shinada. Advancements in text mining and deep learning to improve toxicity prediction. OpenTox 2020

Contact us

Part of this study was conducted as a result of a similar project for mutagenicity prediction that was funded by Eisai Ltd. Co. We acknowledge the funding support for the project which eventually led to some of the ideas presented in this paper.

Please feel free to email us in case you have enquiries about the study or the source code.

SBX Corporation and The systems biology insttitute