Prediction of the classification, labelling and packaging regulation H-statements with confidence using conformal prediction with N-grams and molecular fingerprints

Effective chemical hazard labelling systems are essential for safeguarding human health and the environment as a result of widespread chemical use, and machine-learning models can be used to predict hazard labels efficiently and reduce the use of animal tests. This investigation shows the utility of...

Full description

Saved in:
Bibliographic Details
Main Authors: Ulf Norinder, Ziye Zheng, Ian Cotgreave
Format: Article
Language:English
Published: Elsevier 2025-01-01
Series:Current Research in Toxicology
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S2666027X25000283
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850130884130242560
author Ulf Norinder
Ziye Zheng
Ian Cotgreave
author_facet Ulf Norinder
Ziye Zheng
Ian Cotgreave
author_sort Ulf Norinder
collection DOAJ
description Effective chemical hazard labelling systems are essential for safeguarding human health and the environment as a result of widespread chemical use, and machine-learning models can be used to predict hazard labels efficiently and reduce the use of animal tests. This investigation shows the utility of N-grams and other fingerprint featurization procedures for predicting classification, labelling and packaging (CLP). Regulation H-statements, particularly in an ensemble (consensus) setting. Consensus modelling by class or Conformal Prediction median p-values seems to be particularly advantageous in order to obtain both high conformal prediction validity and efficiency as well as good balanced accuracy, sensitivity and specificity. Utilization of the N-grams allows handling of all symbols in SMILES strings including those related to metals and salts that may be important for the compounds to exhibit their experimental determined toxicities. The models developed in this study are efficient tools to access hazard classification H-statements of chemicals, which can be useful for chemical hazard assessment, read-across as well as risk management.
format Article
id doaj-art-2acca43485b048bf84b0b5a417a77744
institution OA Journals
issn 2666-027X
language English
publishDate 2025-01-01
publisher Elsevier
record_format Article
series Current Research in Toxicology
spelling doaj-art-2acca43485b048bf84b0b5a417a777442025-08-20T02:32:34ZengElsevierCurrent Research in Toxicology2666-027X2025-01-01810024210.1016/j.crtox.2025.100242Prediction of the classification, labelling and packaging regulation H-statements with confidence using conformal prediction with N-grams and molecular fingerprintsUlf Norinder0Ziye Zheng1Ian Cotgreave2Department of Computer and Systems Sciences, Stockholm University, P.O. Box 1073, SE-164 25 Kista, Sweden; MTM Research Centre, School of Science and Technology, Örebro University, 701 82 Örebro, Sweden; Corresponding author at: MTM Research Centre, School of Science and Technology, Örebro University, 701 82 Örebro, Sweden.Cytiva, Björkgatan 30, 75 323 Uppsala, Sweden; Chemical and Pharmaceutical Safety, Research Institute of Sweden (RISE), Forskargatan 18, 15 136 Södertälje, Sweden; IVL Swedish Environmental Research Institute, 10 031 Stockholm, SwedenChemical and Pharmaceutical Safety, Research Institute of Sweden (RISE), Forskargatan 18, 15 136 Södertälje, SwedenEffective chemical hazard labelling systems are essential for safeguarding human health and the environment as a result of widespread chemical use, and machine-learning models can be used to predict hazard labels efficiently and reduce the use of animal tests. This investigation shows the utility of N-grams and other fingerprint featurization procedures for predicting classification, labelling and packaging (CLP). Regulation H-statements, particularly in an ensemble (consensus) setting. Consensus modelling by class or Conformal Prediction median p-values seems to be particularly advantageous in order to obtain both high conformal prediction validity and efficiency as well as good balanced accuracy, sensitivity and specificity. Utilization of the N-grams allows handling of all symbols in SMILES strings including those related to metals and salts that may be important for the compounds to exhibit their experimental determined toxicities. The models developed in this study are efficient tools to access hazard classification H-statements of chemicals, which can be useful for chemical hazard assessment, read-across as well as risk management.http://www.sciencedirect.com/science/article/pii/S2666027X25000283CLP RegulationConformal predictionConsensus modelingH-statementsMolecular fingerprintsN-grams
spellingShingle Ulf Norinder
Ziye Zheng
Ian Cotgreave
Prediction of the classification, labelling and packaging regulation H-statements with confidence using conformal prediction with N-grams and molecular fingerprints
Current Research in Toxicology
CLP Regulation
Conformal prediction
Consensus modeling
H-statements
Molecular fingerprints
N-grams
title Prediction of the classification, labelling and packaging regulation H-statements with confidence using conformal prediction with N-grams and molecular fingerprints
title_full Prediction of the classification, labelling and packaging regulation H-statements with confidence using conformal prediction with N-grams and molecular fingerprints
title_fullStr Prediction of the classification, labelling and packaging regulation H-statements with confidence using conformal prediction with N-grams and molecular fingerprints
title_full_unstemmed Prediction of the classification, labelling and packaging regulation H-statements with confidence using conformal prediction with N-grams and molecular fingerprints
title_short Prediction of the classification, labelling and packaging regulation H-statements with confidence using conformal prediction with N-grams and molecular fingerprints
title_sort prediction of the classification labelling and packaging regulation h statements with confidence using conformal prediction with n grams and molecular fingerprints
topic CLP Regulation
Conformal prediction
Consensus modeling
H-statements
Molecular fingerprints
N-grams
url http://www.sciencedirect.com/science/article/pii/S2666027X25000283
work_keys_str_mv AT ulfnorinder predictionoftheclassificationlabellingandpackagingregulationhstatementswithconfidenceusingconformalpredictionwithngramsandmolecularfingerprints
AT ziyezheng predictionoftheclassificationlabellingandpackagingregulationhstatementswithconfidenceusingconformalpredictionwithngramsandmolecularfingerprints
AT iancotgreave predictionoftheclassificationlabellingandpackagingregulationhstatementswithconfidenceusingconformalpredictionwithngramsandmolecularfingerprints