Prediction of the classification, labelling and packaging regulation H-statements with confidence using conformal prediction with N-grams and molecular fingerprints

Effective chemical hazard labelling systems are essential for safeguarding human health and the environment as a result of widespread chemical use, and machine-learning models can be used to predict hazard labels efficiently and reduce the use of animal tests. This investigation shows the utility of...

Full description

Saved in:
Bibliographic Details
Main Authors: Ulf Norinder, Ziye Zheng, Ian Cotgreave
Format: Article
Language:English
Published: Elsevier 2025-01-01
Series:Current Research in Toxicology
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S2666027X25000283
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Effective chemical hazard labelling systems are essential for safeguarding human health and the environment as a result of widespread chemical use, and machine-learning models can be used to predict hazard labels efficiently and reduce the use of animal tests. This investigation shows the utility of N-grams and other fingerprint featurization procedures for predicting classification, labelling and packaging (CLP). Regulation H-statements, particularly in an ensemble (consensus) setting. Consensus modelling by class or Conformal Prediction median p-values seems to be particularly advantageous in order to obtain both high conformal prediction validity and efficiency as well as good balanced accuracy, sensitivity and specificity. Utilization of the N-grams allows handling of all symbols in SMILES strings including those related to metals and salts that may be important for the compounds to exhibit their experimental determined toxicities. The models developed in this study are efficient tools to access hazard classification H-statements of chemicals, which can be useful for chemical hazard assessment, read-across as well as risk management.
ISSN:2666-027X