scCompass: An Integrated Multi‐Species scRNA‐seq Database for AI‐Ready

Abstract Emerging single‐cell sequencing technology has generated large amounts of data, allowing analysis of cellular dynamics and gene regulation at the single‐cell resolution. Advances in artificial intelligence enhance life sciences research by delivering critical insights and optimizing data an...

Full description

Saved in:
Bibliographic Details
Main Authors: Pengfei Wang, Wenhao Liu, Jiajia Wang, Yana Liu, Pengjiang Li, Ping Xu, Wentao Cui, Ran Zhang, Qingqing Long, Zhilong Hu, Chen Fang, Jingxi Dong, Chunyang Zhang, Yan Chen, Chengrui Wang, Guole Liu, Hanyu Xie, Yiyang Zhang, Meng Xiao, Shubai Chen, Haiping Jiang, The X‐Compass Consortium, Yiqiang Chen, Ge Yang, Shihua Zhang, Zhen Meng, Xuezhi Wang, Guihai Feng, Xin Li, Yuanchun Zhou
Format: Article
Language:English
Published: Wiley 2025-07-01
Series:Advanced Science
Subjects:
Online Access:https://doi.org/10.1002/advs.202500870
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849427597070434304
author Pengfei Wang
Wenhao Liu
Jiajia Wang
Yana Liu
Pengjiang Li
Ping Xu
Wentao Cui
Ran Zhang
Qingqing Long
Zhilong Hu
Chen Fang
Jingxi Dong
Chunyang Zhang
Yan Chen
Chengrui Wang
Guole Liu
Hanyu Xie
Yiyang Zhang
Meng Xiao
Shubai Chen
Haiping Jiang
The X‐Compass Consortium
Yiqiang Chen
Ge Yang
Shihua Zhang
Zhen Meng
Xuezhi Wang
Guihai Feng
Xin Li
Yuanchun Zhou
author_facet Pengfei Wang
Wenhao Liu
Jiajia Wang
Yana Liu
Pengjiang Li
Ping Xu
Wentao Cui
Ran Zhang
Qingqing Long
Zhilong Hu
Chen Fang
Jingxi Dong
Chunyang Zhang
Yan Chen
Chengrui Wang
Guole Liu
Hanyu Xie
Yiyang Zhang
Meng Xiao
Shubai Chen
Haiping Jiang
The X‐Compass Consortium
Yiqiang Chen
Ge Yang
Shihua Zhang
Zhen Meng
Xuezhi Wang
Guihai Feng
Xin Li
Yuanchun Zhou
author_sort Pengfei Wang
collection DOAJ
description Abstract Emerging single‐cell sequencing technology has generated large amounts of data, allowing analysis of cellular dynamics and gene regulation at the single‐cell resolution. Advances in artificial intelligence enhance life sciences research by delivering critical insights and optimizing data analysis processes. However, inconsistent data processing quality and standards remain to be a major challenge. Here scCompass is proposed, which provides a comprehensive resource designed to build large‐scale, multi‐species, and model‐friendly single‐cell data collection. By applying standardized data pre‐processing, scCompass integrates and curates transcriptomic data from nearly 105 million single cells across 13 species. Using this extensive dataset, it is able to identify stable expression genes (SEGs) and organ‐specific expression genes (OSGs) in humans and mice. Different scalable datasets are provided that can be easily adapted for AI model training and the pretrained checkpoints with state‐of‐the‐art single‐cell foundation models. In summary, scCompass is highly efficient and scalable database for AI‐ready, which combined with user‐friendly data sharing, visualization, and online analysis, greatly simplifies data access and exploitation for researchers in single‐cell biology (http://www.bdbe.cn/kun).
format Article
id doaj-art-debd60655d1349ce80aa99488ea8716f
institution Kabale University
issn 2198-3844
language English
publishDate 2025-07-01
publisher Wiley
record_format Article
series Advanced Science
spelling doaj-art-debd60655d1349ce80aa99488ea8716f2025-08-20T03:28:58ZengWileyAdvanced Science2198-38442025-07-011225n/an/a10.1002/advs.202500870scCompass: An Integrated Multi‐Species scRNA‐seq Database for AI‐ReadyPengfei Wang0Wenhao Liu1Jiajia Wang2Yana Liu3Pengjiang Li4Ping Xu5Wentao Cui6Ran Zhang7Qingqing Long8Zhilong Hu9Chen Fang10Jingxi Dong11Chunyang Zhang12Yan Chen13Chengrui Wang14Guole Liu15Hanyu Xie16Yiyang Zhang17Meng Xiao18Shubai Chen19Haiping Jiang20The X‐Compass ConsortiumYiqiang Chen21Ge Yang22Shihua Zhang23Zhen Meng24Xuezhi Wang25Guihai Feng26Xin Li27Yuanchun Zhou28Computer Network Information Center Chinese Academy of Sciences Beijing 100083 ChinaState Key Laboratory of Stem Cell and Reproductive Biology Institute of Zoology Chinese Academy of Sciences Beijing 100101 ChinaComputer Network Information Center Chinese Academy of Sciences Beijing 100083 ChinaState Key Laboratory of Stem Cell and Reproductive Biology Institute of Zoology Chinese Academy of Sciences Beijing 100101 ChinaComputer Network Information Center Chinese Academy of Sciences Beijing 100083 ChinaComputer Network Information Center Chinese Academy of Sciences Beijing 100083 ChinaComputer Network Information Center Chinese Academy of Sciences Beijing 100083 ChinaComputer Network Information Center Chinese Academy of Sciences Beijing 100083 ChinaComputer Network Information Center Chinese Academy of Sciences Beijing 100083 ChinaComputer Network Information Center Chinese Academy of Sciences Beijing 100083 ChinaState Key Laboratory of Stem Cell and Reproductive Biology Institute of Zoology Chinese Academy of Sciences Beijing 100101 ChinaState Key Laboratory of Stem Cell and Reproductive Biology Institute of Zoology Chinese Academy of Sciences Beijing 100101 ChinaComputer Network Information Center Chinese Academy of Sciences Beijing 100083 ChinaComputer Network Information Center Chinese Academy of Sciences Beijing 100083 ChinaComputer Network Information Center Chinese Academy of Sciences Beijing 100083 ChinaState Key Laboratory of Multimodal Artificial Intelligence Systems Institute of Automation Chinese Academy of Sciences Beijing 100190 ChinaComputer Network Information Center Chinese Academy of Sciences Beijing 100083 ChinaCEMS NCMIS HCMS MDIS RCSDS Academy of Mathematics and Systems Science Chinese Academy of Sciences Beijing 100190 ChinaComputer Network Information Center Chinese Academy of Sciences Beijing 100083 ChinaBeijing Key Laboratory of Mobile Computing and Pervasive Device Institute of Computing Technology Chinese Academy of Sciences Beijing 100190 ChinaState Key Laboratory of Stem Cell and Reproductive Biology Institute of Zoology Chinese Academy of Sciences Beijing 100101 ChinaBeijing Key Laboratory of Mobile Computing and Pervasive Device Institute of Computing Technology Chinese Academy of Sciences Beijing 100190 ChinaState Key Laboratory of Multimodal Artificial Intelligence Systems Institute of Automation Chinese Academy of Sciences Beijing 100190 ChinaCEMS NCMIS HCMS MDIS RCSDS Academy of Mathematics and Systems Science Chinese Academy of Sciences Beijing 100190 ChinaComputer Network Information Center Chinese Academy of Sciences Beijing 100083 ChinaComputer Network Information Center Chinese Academy of Sciences Beijing 100083 ChinaState Key Laboratory of Stem Cell and Reproductive Biology Institute of Zoology Chinese Academy of Sciences Beijing 100101 ChinaState Key Laboratory of Stem Cell and Reproductive Biology Institute of Zoology Chinese Academy of Sciences Beijing 100101 ChinaComputer Network Information Center Chinese Academy of Sciences Beijing 100083 ChinaAbstract Emerging single‐cell sequencing technology has generated large amounts of data, allowing analysis of cellular dynamics and gene regulation at the single‐cell resolution. Advances in artificial intelligence enhance life sciences research by delivering critical insights and optimizing data analysis processes. However, inconsistent data processing quality and standards remain to be a major challenge. Here scCompass is proposed, which provides a comprehensive resource designed to build large‐scale, multi‐species, and model‐friendly single‐cell data collection. By applying standardized data pre‐processing, scCompass integrates and curates transcriptomic data from nearly 105 million single cells across 13 species. Using this extensive dataset, it is able to identify stable expression genes (SEGs) and organ‐specific expression genes (OSGs) in humans and mice. Different scalable datasets are provided that can be easily adapted for AI model training and the pretrained checkpoints with state‐of‐the‐art single‐cell foundation models. In summary, scCompass is highly efficient and scalable database for AI‐ready, which combined with user‐friendly data sharing, visualization, and online analysis, greatly simplifies data access and exploitation for researchers in single‐cell biology (http://www.bdbe.cn/kun).https://doi.org/10.1002/advs.202500870AI‐readymulti‐speciesscRNA‐seq databasesingle‐cell
spellingShingle Pengfei Wang
Wenhao Liu
Jiajia Wang
Yana Liu
Pengjiang Li
Ping Xu
Wentao Cui
Ran Zhang
Qingqing Long
Zhilong Hu
Chen Fang
Jingxi Dong
Chunyang Zhang
Yan Chen
Chengrui Wang
Guole Liu
Hanyu Xie
Yiyang Zhang
Meng Xiao
Shubai Chen
Haiping Jiang
The X‐Compass Consortium
Yiqiang Chen
Ge Yang
Shihua Zhang
Zhen Meng
Xuezhi Wang
Guihai Feng
Xin Li
Yuanchun Zhou
scCompass: An Integrated Multi‐Species scRNA‐seq Database for AI‐Ready
Advanced Science
AI‐ready
multi‐species
scRNA‐seq database
single‐cell
title scCompass: An Integrated Multi‐Species scRNA‐seq Database for AI‐Ready
title_full scCompass: An Integrated Multi‐Species scRNA‐seq Database for AI‐Ready
title_fullStr scCompass: An Integrated Multi‐Species scRNA‐seq Database for AI‐Ready
title_full_unstemmed scCompass: An Integrated Multi‐Species scRNA‐seq Database for AI‐Ready
title_short scCompass: An Integrated Multi‐Species scRNA‐seq Database for AI‐Ready
title_sort sccompass an integrated multi species scrna seq database for ai ready
topic AI‐ready
multi‐species
scRNA‐seq database
single‐cell
url https://doi.org/10.1002/advs.202500870
work_keys_str_mv AT pengfeiwang sccompassanintegratedmultispeciesscrnaseqdatabaseforaiready
AT wenhaoliu sccompassanintegratedmultispeciesscrnaseqdatabaseforaiready
AT jiajiawang sccompassanintegratedmultispeciesscrnaseqdatabaseforaiready
AT yanaliu sccompassanintegratedmultispeciesscrnaseqdatabaseforaiready
AT pengjiangli sccompassanintegratedmultispeciesscrnaseqdatabaseforaiready
AT pingxu sccompassanintegratedmultispeciesscrnaseqdatabaseforaiready
AT wentaocui sccompassanintegratedmultispeciesscrnaseqdatabaseforaiready
AT ranzhang sccompassanintegratedmultispeciesscrnaseqdatabaseforaiready
AT qingqinglong sccompassanintegratedmultispeciesscrnaseqdatabaseforaiready
AT zhilonghu sccompassanintegratedmultispeciesscrnaseqdatabaseforaiready
AT chenfang sccompassanintegratedmultispeciesscrnaseqdatabaseforaiready
AT jingxidong sccompassanintegratedmultispeciesscrnaseqdatabaseforaiready
AT chunyangzhang sccompassanintegratedmultispeciesscrnaseqdatabaseforaiready
AT yanchen sccompassanintegratedmultispeciesscrnaseqdatabaseforaiready
AT chengruiwang sccompassanintegratedmultispeciesscrnaseqdatabaseforaiready
AT guoleliu sccompassanintegratedmultispeciesscrnaseqdatabaseforaiready
AT hanyuxie sccompassanintegratedmultispeciesscrnaseqdatabaseforaiready
AT yiyangzhang sccompassanintegratedmultispeciesscrnaseqdatabaseforaiready
AT mengxiao sccompassanintegratedmultispeciesscrnaseqdatabaseforaiready
AT shubaichen sccompassanintegratedmultispeciesscrnaseqdatabaseforaiready
AT haipingjiang sccompassanintegratedmultispeciesscrnaseqdatabaseforaiready
AT thexcompassconsortium sccompassanintegratedmultispeciesscrnaseqdatabaseforaiready
AT yiqiangchen sccompassanintegratedmultispeciesscrnaseqdatabaseforaiready
AT geyang sccompassanintegratedmultispeciesscrnaseqdatabaseforaiready
AT shihuazhang sccompassanintegratedmultispeciesscrnaseqdatabaseforaiready
AT zhenmeng sccompassanintegratedmultispeciesscrnaseqdatabaseforaiready
AT xuezhiwang sccompassanintegratedmultispeciesscrnaseqdatabaseforaiready
AT guihaifeng sccompassanintegratedmultispeciesscrnaseqdatabaseforaiready
AT xinli sccompassanintegratedmultispeciesscrnaseqdatabaseforaiready
AT yuanchunzhou sccompassanintegratedmultispeciesscrnaseqdatabaseforaiready