In-context learning enables multimodal large language models to classify cancer pathology images

Abstract Medical image classification requires labeled, task-specific datasets which are used to train deep learning networks de novo, or to fine-tune foundation models. However, this process is computationally and technically demanding. In language processing, in-context learning provides an altern...

Full description

Saved in:
Bibliographic Details
Main Authors: Dyke Ferber, Georg Wölflein, Isabella C. Wiest, Marta Ligero, Srividhya Sainath, Narmin Ghaffari Laleh, Omar S. M. El Nahhas, Gustav Müller-Franzes, Dirk Jäger, Daniel Truhn, Jakob Nikolas Kather
Format: Article
Language:English
Published: Nature Portfolio 2024-11-01
Series:Nature Communications
Online Access:https://doi.org/10.1038/s41467-024-51465-9
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849221064130822144
author Dyke Ferber
Georg Wölflein
Isabella C. Wiest
Marta Ligero
Srividhya Sainath
Narmin Ghaffari Laleh
Omar S. M. El Nahhas
Gustav Müller-Franzes
Dirk Jäger
Daniel Truhn
Jakob Nikolas Kather
author_facet Dyke Ferber
Georg Wölflein
Isabella C. Wiest
Marta Ligero
Srividhya Sainath
Narmin Ghaffari Laleh
Omar S. M. El Nahhas
Gustav Müller-Franzes
Dirk Jäger
Daniel Truhn
Jakob Nikolas Kather
author_sort Dyke Ferber
collection DOAJ
description Abstract Medical image classification requires labeled, task-specific datasets which are used to train deep learning networks de novo, or to fine-tune foundation models. However, this process is computationally and technically demanding. In language processing, in-context learning provides an alternative, where models learn from within prompts, bypassing the need for parameter updates. Yet, in-context learning remains underexplored in medical image analysis. Here, we systematically evaluate the model Generative Pretrained Transformer 4 with Vision capabilities (GPT-4V) on cancer image processing with in-context learning on three cancer histopathology tasks of high importance: Classification of tissue subtypes in colorectal cancer, colon polyp subtyping and breast tumor detection in lymph node sections. Our results show that in-context learning is sufficient to match or even outperform specialized neural networks trained for particular tasks, while only requiring a minimal number of samples. In summary, this study demonstrates that large vision language models trained on non-domain specific data can be applied out-of-the box to solve medical image-processing tasks in histopathology. This democratizes access of generalist AI models to medical experts without technical background especially for areas where annotated data is scarce.
format Article
id doaj-art-3fa7eb85942f4c83b6c9d066cad3cbe7
institution Kabale University
issn 2041-1723
language English
publishDate 2024-11-01
publisher Nature Portfolio
record_format Article
series Nature Communications
spelling doaj-art-3fa7eb85942f4c83b6c9d066cad3cbe72024-11-24T12:34:47ZengNature PortfolioNature Communications2041-17232024-11-0115111210.1038/s41467-024-51465-9In-context learning enables multimodal large language models to classify cancer pathology imagesDyke Ferber0Georg Wölflein1Isabella C. Wiest2Marta Ligero3Srividhya Sainath4Narmin Ghaffari Laleh5Omar S. M. El Nahhas6Gustav Müller-Franzes7Dirk Jäger8Daniel Truhn9Jakob Nikolas Kather10National Center for Tumor Diseases (NCT), Heidelberg University HospitalSchool of Computer Science, University of St AndrewsElse Kroener Fresenius Center for Digital Health, Technical University DresdenElse Kroener Fresenius Center for Digital Health, Technical University DresdenElse Kroener Fresenius Center for Digital Health, Technical University DresdenElse Kroener Fresenius Center for Digital Health, Technical University DresdenElse Kroener Fresenius Center for Digital Health, Technical University DresdenDepartment of Diagnostic and Interventional Radiology, University Hospital AachenNational Center for Tumor Diseases (NCT), Heidelberg University HospitalDepartment of Diagnostic and Interventional Radiology, University Hospital AachenNational Center for Tumor Diseases (NCT), Heidelberg University HospitalAbstract Medical image classification requires labeled, task-specific datasets which are used to train deep learning networks de novo, or to fine-tune foundation models. However, this process is computationally and technically demanding. In language processing, in-context learning provides an alternative, where models learn from within prompts, bypassing the need for parameter updates. Yet, in-context learning remains underexplored in medical image analysis. Here, we systematically evaluate the model Generative Pretrained Transformer 4 with Vision capabilities (GPT-4V) on cancer image processing with in-context learning on three cancer histopathology tasks of high importance: Classification of tissue subtypes in colorectal cancer, colon polyp subtyping and breast tumor detection in lymph node sections. Our results show that in-context learning is sufficient to match or even outperform specialized neural networks trained for particular tasks, while only requiring a minimal number of samples. In summary, this study demonstrates that large vision language models trained on non-domain specific data can be applied out-of-the box to solve medical image-processing tasks in histopathology. This democratizes access of generalist AI models to medical experts without technical background especially for areas where annotated data is scarce.https://doi.org/10.1038/s41467-024-51465-9
spellingShingle Dyke Ferber
Georg Wölflein
Isabella C. Wiest
Marta Ligero
Srividhya Sainath
Narmin Ghaffari Laleh
Omar S. M. El Nahhas
Gustav Müller-Franzes
Dirk Jäger
Daniel Truhn
Jakob Nikolas Kather
In-context learning enables multimodal large language models to classify cancer pathology images
Nature Communications
title In-context learning enables multimodal large language models to classify cancer pathology images
title_full In-context learning enables multimodal large language models to classify cancer pathology images
title_fullStr In-context learning enables multimodal large language models to classify cancer pathology images
title_full_unstemmed In-context learning enables multimodal large language models to classify cancer pathology images
title_short In-context learning enables multimodal large language models to classify cancer pathology images
title_sort in context learning enables multimodal large language models to classify cancer pathology images
url https://doi.org/10.1038/s41467-024-51465-9
work_keys_str_mv AT dykeferber incontextlearningenablesmultimodallargelanguagemodelstoclassifycancerpathologyimages
AT georgwolflein incontextlearningenablesmultimodallargelanguagemodelstoclassifycancerpathologyimages
AT isabellacwiest incontextlearningenablesmultimodallargelanguagemodelstoclassifycancerpathologyimages
AT martaligero incontextlearningenablesmultimodallargelanguagemodelstoclassifycancerpathologyimages
AT srividhyasainath incontextlearningenablesmultimodallargelanguagemodelstoclassifycancerpathologyimages
AT narminghaffarilaleh incontextlearningenablesmultimodallargelanguagemodelstoclassifycancerpathologyimages
AT omarsmelnahhas incontextlearningenablesmultimodallargelanguagemodelstoclassifycancerpathologyimages
AT gustavmullerfranzes incontextlearningenablesmultimodallargelanguagemodelstoclassifycancerpathologyimages
AT dirkjager incontextlearningenablesmultimodallargelanguagemodelstoclassifycancerpathologyimages
AT danieltruhn incontextlearningenablesmultimodallargelanguagemodelstoclassifycancerpathologyimages
AT jakobnikolaskather incontextlearningenablesmultimodallargelanguagemodelstoclassifycancerpathologyimages