Prediction of the classification, labelling and packaging regulation H-statements with confidence using conformal prediction with N-grams and molecular fingerprints
Effective chemical hazard labelling systems are essential for safeguarding human health and the environment as a result of widespread chemical use, and machine-learning models can be used to predict hazard labels efficiently and reduce the use of animal tests. This investigation shows the utility of...
Saved in:
| Main Authors: | , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Elsevier
2025-01-01
|
| Series: | Current Research in Toxicology |
| Subjects: | |
| Online Access: | http://www.sciencedirect.com/science/article/pii/S2666027X25000283 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1850130884130242560 |
|---|---|
| author | Ulf Norinder Ziye Zheng Ian Cotgreave |
| author_facet | Ulf Norinder Ziye Zheng Ian Cotgreave |
| author_sort | Ulf Norinder |
| collection | DOAJ |
| description | Effective chemical hazard labelling systems are essential for safeguarding human health and the environment as a result of widespread chemical use, and machine-learning models can be used to predict hazard labels efficiently and reduce the use of animal tests. This investigation shows the utility of N-grams and other fingerprint featurization procedures for predicting classification, labelling and packaging (CLP). Regulation H-statements, particularly in an ensemble (consensus) setting. Consensus modelling by class or Conformal Prediction median p-values seems to be particularly advantageous in order to obtain both high conformal prediction validity and efficiency as well as good balanced accuracy, sensitivity and specificity. Utilization of the N-grams allows handling of all symbols in SMILES strings including those related to metals and salts that may be important for the compounds to exhibit their experimental determined toxicities. The models developed in this study are efficient tools to access hazard classification H-statements of chemicals, which can be useful for chemical hazard assessment, read-across as well as risk management. |
| format | Article |
| id | doaj-art-2acca43485b048bf84b0b5a417a77744 |
| institution | OA Journals |
| issn | 2666-027X |
| language | English |
| publishDate | 2025-01-01 |
| publisher | Elsevier |
| record_format | Article |
| series | Current Research in Toxicology |
| spelling | doaj-art-2acca43485b048bf84b0b5a417a777442025-08-20T02:32:34ZengElsevierCurrent Research in Toxicology2666-027X2025-01-01810024210.1016/j.crtox.2025.100242Prediction of the classification, labelling and packaging regulation H-statements with confidence using conformal prediction with N-grams and molecular fingerprintsUlf Norinder0Ziye Zheng1Ian Cotgreave2Department of Computer and Systems Sciences, Stockholm University, P.O. Box 1073, SE-164 25 Kista, Sweden; MTM Research Centre, School of Science and Technology, Örebro University, 701 82 Örebro, Sweden; Corresponding author at: MTM Research Centre, School of Science and Technology, Örebro University, 701 82 Örebro, Sweden.Cytiva, Björkgatan 30, 75 323 Uppsala, Sweden; Chemical and Pharmaceutical Safety, Research Institute of Sweden (RISE), Forskargatan 18, 15 136 Södertälje, Sweden; IVL Swedish Environmental Research Institute, 10 031 Stockholm, SwedenChemical and Pharmaceutical Safety, Research Institute of Sweden (RISE), Forskargatan 18, 15 136 Södertälje, SwedenEffective chemical hazard labelling systems are essential for safeguarding human health and the environment as a result of widespread chemical use, and machine-learning models can be used to predict hazard labels efficiently and reduce the use of animal tests. This investigation shows the utility of N-grams and other fingerprint featurization procedures for predicting classification, labelling and packaging (CLP). Regulation H-statements, particularly in an ensemble (consensus) setting. Consensus modelling by class or Conformal Prediction median p-values seems to be particularly advantageous in order to obtain both high conformal prediction validity and efficiency as well as good balanced accuracy, sensitivity and specificity. Utilization of the N-grams allows handling of all symbols in SMILES strings including those related to metals and salts that may be important for the compounds to exhibit their experimental determined toxicities. The models developed in this study are efficient tools to access hazard classification H-statements of chemicals, which can be useful for chemical hazard assessment, read-across as well as risk management.http://www.sciencedirect.com/science/article/pii/S2666027X25000283CLP RegulationConformal predictionConsensus modelingH-statementsMolecular fingerprintsN-grams |
| spellingShingle | Ulf Norinder Ziye Zheng Ian Cotgreave Prediction of the classification, labelling and packaging regulation H-statements with confidence using conformal prediction with N-grams and molecular fingerprints Current Research in Toxicology CLP Regulation Conformal prediction Consensus modeling H-statements Molecular fingerprints N-grams |
| title | Prediction of the classification, labelling and packaging regulation H-statements with confidence using conformal prediction with N-grams and molecular fingerprints |
| title_full | Prediction of the classification, labelling and packaging regulation H-statements with confidence using conformal prediction with N-grams and molecular fingerprints |
| title_fullStr | Prediction of the classification, labelling and packaging regulation H-statements with confidence using conformal prediction with N-grams and molecular fingerprints |
| title_full_unstemmed | Prediction of the classification, labelling and packaging regulation H-statements with confidence using conformal prediction with N-grams and molecular fingerprints |
| title_short | Prediction of the classification, labelling and packaging regulation H-statements with confidence using conformal prediction with N-grams and molecular fingerprints |
| title_sort | prediction of the classification labelling and packaging regulation h statements with confidence using conformal prediction with n grams and molecular fingerprints |
| topic | CLP Regulation Conformal prediction Consensus modeling H-statements Molecular fingerprints N-grams |
| url | http://www.sciencedirect.com/science/article/pii/S2666027X25000283 |
| work_keys_str_mv | AT ulfnorinder predictionoftheclassificationlabellingandpackagingregulationhstatementswithconfidenceusingconformalpredictionwithngramsandmolecularfingerprints AT ziyezheng predictionoftheclassificationlabellingandpackagingregulationhstatementswithconfidenceusingconformalpredictionwithngramsandmolecularfingerprints AT iancotgreave predictionoftheclassificationlabellingandpackagingregulationhstatementswithconfidenceusingconformalpredictionwithngramsandmolecularfingerprints |