Scaling down annotation needs: The capacity of self-supervised learning on diatom classification

Summary: In the field of life sciences, diatoms are essential biomarkers for assessing environmental health. Recent advancements in deep learning have transformed the traditionally laborious process of diatom classification through light microscopy. However, commonly used supervised learning methodo...

Full description

Saved in:
Bibliographic Details
Main Authors: Mingkun Tan, Daniel Langenkämper, Michael Kloster, Tim W. Nattkemper
Format: Article
Language:English
Published: Elsevier 2025-04-01
Series:iScience
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S2589004225004973
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Summary: In the field of life sciences, diatoms are essential biomarkers for assessing environmental health. Recent advancements in deep learning have transformed the traditionally laborious process of diatom classification through light microscopy. However, commonly used supervised learning methodologies necessitate annotated data, demanding the expertise of seasoned professionals. This study introduces self-supervised learning to tackle the challenge of scarce annotation in diatom classification. First, our results reveal that self-supervised pre-trained models considerably enhance the utilization effectiveness of available annotated data, with benefits increasing as the dataset size decreases. Second, fine-tuning our models with a very small labeled dataset (e.g., 50 samples per class) yields macro-average accuracy comparable to full-supervised levels, thereby reducing the reliance on taxonomic experts by approximately 96.0%. Moreover, extending the pre-training phase to 1600 epochs further reduced the dependency on annotations, achieving comparable accuracy with merely 30 samples per class.
ISSN:2589-0042