scCompass: An Integrated Multi‐Species scRNA‐seq Database for AI‐Ready
Abstract Emerging single‐cell sequencing technology has generated large amounts of data, allowing analysis of cellular dynamics and gene regulation at the single‐cell resolution. Advances in artificial intelligence enhance life sciences research by delivering critical insights and optimizing data an...
Saved in:
| Main Authors: | , , , , , , , , , , , , , , , , , , , , , , , , , , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Wiley
2025-07-01
|
| Series: | Advanced Science |
| Subjects: | |
| Online Access: | https://doi.org/10.1002/advs.202500870 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849427597070434304 |
|---|---|
| author | Pengfei Wang Wenhao Liu Jiajia Wang Yana Liu Pengjiang Li Ping Xu Wentao Cui Ran Zhang Qingqing Long Zhilong Hu Chen Fang Jingxi Dong Chunyang Zhang Yan Chen Chengrui Wang Guole Liu Hanyu Xie Yiyang Zhang Meng Xiao Shubai Chen Haiping Jiang The X‐Compass Consortium Yiqiang Chen Ge Yang Shihua Zhang Zhen Meng Xuezhi Wang Guihai Feng Xin Li Yuanchun Zhou |
| author_facet | Pengfei Wang Wenhao Liu Jiajia Wang Yana Liu Pengjiang Li Ping Xu Wentao Cui Ran Zhang Qingqing Long Zhilong Hu Chen Fang Jingxi Dong Chunyang Zhang Yan Chen Chengrui Wang Guole Liu Hanyu Xie Yiyang Zhang Meng Xiao Shubai Chen Haiping Jiang The X‐Compass Consortium Yiqiang Chen Ge Yang Shihua Zhang Zhen Meng Xuezhi Wang Guihai Feng Xin Li Yuanchun Zhou |
| author_sort | Pengfei Wang |
| collection | DOAJ |
| description | Abstract Emerging single‐cell sequencing technology has generated large amounts of data, allowing analysis of cellular dynamics and gene regulation at the single‐cell resolution. Advances in artificial intelligence enhance life sciences research by delivering critical insights and optimizing data analysis processes. However, inconsistent data processing quality and standards remain to be a major challenge. Here scCompass is proposed, which provides a comprehensive resource designed to build large‐scale, multi‐species, and model‐friendly single‐cell data collection. By applying standardized data pre‐processing, scCompass integrates and curates transcriptomic data from nearly 105 million single cells across 13 species. Using this extensive dataset, it is able to identify stable expression genes (SEGs) and organ‐specific expression genes (OSGs) in humans and mice. Different scalable datasets are provided that can be easily adapted for AI model training and the pretrained checkpoints with state‐of‐the‐art single‐cell foundation models. In summary, scCompass is highly efficient and scalable database for AI‐ready, which combined with user‐friendly data sharing, visualization, and online analysis, greatly simplifies data access and exploitation for researchers in single‐cell biology (http://www.bdbe.cn/kun). |
| format | Article |
| id | doaj-art-debd60655d1349ce80aa99488ea8716f |
| institution | Kabale University |
| issn | 2198-3844 |
| language | English |
| publishDate | 2025-07-01 |
| publisher | Wiley |
| record_format | Article |
| series | Advanced Science |
| spelling | doaj-art-debd60655d1349ce80aa99488ea8716f2025-08-20T03:28:58ZengWileyAdvanced Science2198-38442025-07-011225n/an/a10.1002/advs.202500870scCompass: An Integrated Multi‐Species scRNA‐seq Database for AI‐ReadyPengfei Wang0Wenhao Liu1Jiajia Wang2Yana Liu3Pengjiang Li4Ping Xu5Wentao Cui6Ran Zhang7Qingqing Long8Zhilong Hu9Chen Fang10Jingxi Dong11Chunyang Zhang12Yan Chen13Chengrui Wang14Guole Liu15Hanyu Xie16Yiyang Zhang17Meng Xiao18Shubai Chen19Haiping Jiang20The X‐Compass ConsortiumYiqiang Chen21Ge Yang22Shihua Zhang23Zhen Meng24Xuezhi Wang25Guihai Feng26Xin Li27Yuanchun Zhou28Computer Network Information Center Chinese Academy of Sciences Beijing 100083 ChinaState Key Laboratory of Stem Cell and Reproductive Biology Institute of Zoology Chinese Academy of Sciences Beijing 100101 ChinaComputer Network Information Center Chinese Academy of Sciences Beijing 100083 ChinaState Key Laboratory of Stem Cell and Reproductive Biology Institute of Zoology Chinese Academy of Sciences Beijing 100101 ChinaComputer Network Information Center Chinese Academy of Sciences Beijing 100083 ChinaComputer Network Information Center Chinese Academy of Sciences Beijing 100083 ChinaComputer Network Information Center Chinese Academy of Sciences Beijing 100083 ChinaComputer Network Information Center Chinese Academy of Sciences Beijing 100083 ChinaComputer Network Information Center Chinese Academy of Sciences Beijing 100083 ChinaComputer Network Information Center Chinese Academy of Sciences Beijing 100083 ChinaState Key Laboratory of Stem Cell and Reproductive Biology Institute of Zoology Chinese Academy of Sciences Beijing 100101 ChinaState Key Laboratory of Stem Cell and Reproductive Biology Institute of Zoology Chinese Academy of Sciences Beijing 100101 ChinaComputer Network Information Center Chinese Academy of Sciences Beijing 100083 ChinaComputer Network Information Center Chinese Academy of Sciences Beijing 100083 ChinaComputer Network Information Center Chinese Academy of Sciences Beijing 100083 ChinaState Key Laboratory of Multimodal Artificial Intelligence Systems Institute of Automation Chinese Academy of Sciences Beijing 100190 ChinaComputer Network Information Center Chinese Academy of Sciences Beijing 100083 ChinaCEMS NCMIS HCMS MDIS RCSDS Academy of Mathematics and Systems Science Chinese Academy of Sciences Beijing 100190 ChinaComputer Network Information Center Chinese Academy of Sciences Beijing 100083 ChinaBeijing Key Laboratory of Mobile Computing and Pervasive Device Institute of Computing Technology Chinese Academy of Sciences Beijing 100190 ChinaState Key Laboratory of Stem Cell and Reproductive Biology Institute of Zoology Chinese Academy of Sciences Beijing 100101 ChinaBeijing Key Laboratory of Mobile Computing and Pervasive Device Institute of Computing Technology Chinese Academy of Sciences Beijing 100190 ChinaState Key Laboratory of Multimodal Artificial Intelligence Systems Institute of Automation Chinese Academy of Sciences Beijing 100190 ChinaCEMS NCMIS HCMS MDIS RCSDS Academy of Mathematics and Systems Science Chinese Academy of Sciences Beijing 100190 ChinaComputer Network Information Center Chinese Academy of Sciences Beijing 100083 ChinaComputer Network Information Center Chinese Academy of Sciences Beijing 100083 ChinaState Key Laboratory of Stem Cell and Reproductive Biology Institute of Zoology Chinese Academy of Sciences Beijing 100101 ChinaState Key Laboratory of Stem Cell and Reproductive Biology Institute of Zoology Chinese Academy of Sciences Beijing 100101 ChinaComputer Network Information Center Chinese Academy of Sciences Beijing 100083 ChinaAbstract Emerging single‐cell sequencing technology has generated large amounts of data, allowing analysis of cellular dynamics and gene regulation at the single‐cell resolution. Advances in artificial intelligence enhance life sciences research by delivering critical insights and optimizing data analysis processes. However, inconsistent data processing quality and standards remain to be a major challenge. Here scCompass is proposed, which provides a comprehensive resource designed to build large‐scale, multi‐species, and model‐friendly single‐cell data collection. By applying standardized data pre‐processing, scCompass integrates and curates transcriptomic data from nearly 105 million single cells across 13 species. Using this extensive dataset, it is able to identify stable expression genes (SEGs) and organ‐specific expression genes (OSGs) in humans and mice. Different scalable datasets are provided that can be easily adapted for AI model training and the pretrained checkpoints with state‐of‐the‐art single‐cell foundation models. In summary, scCompass is highly efficient and scalable database for AI‐ready, which combined with user‐friendly data sharing, visualization, and online analysis, greatly simplifies data access and exploitation for researchers in single‐cell biology (http://www.bdbe.cn/kun).https://doi.org/10.1002/advs.202500870AI‐readymulti‐speciesscRNA‐seq databasesingle‐cell |
| spellingShingle | Pengfei Wang Wenhao Liu Jiajia Wang Yana Liu Pengjiang Li Ping Xu Wentao Cui Ran Zhang Qingqing Long Zhilong Hu Chen Fang Jingxi Dong Chunyang Zhang Yan Chen Chengrui Wang Guole Liu Hanyu Xie Yiyang Zhang Meng Xiao Shubai Chen Haiping Jiang The X‐Compass Consortium Yiqiang Chen Ge Yang Shihua Zhang Zhen Meng Xuezhi Wang Guihai Feng Xin Li Yuanchun Zhou scCompass: An Integrated Multi‐Species scRNA‐seq Database for AI‐Ready Advanced Science AI‐ready multi‐species scRNA‐seq database single‐cell |
| title | scCompass: An Integrated Multi‐Species scRNA‐seq Database for AI‐Ready |
| title_full | scCompass: An Integrated Multi‐Species scRNA‐seq Database for AI‐Ready |
| title_fullStr | scCompass: An Integrated Multi‐Species scRNA‐seq Database for AI‐Ready |
| title_full_unstemmed | scCompass: An Integrated Multi‐Species scRNA‐seq Database for AI‐Ready |
| title_short | scCompass: An Integrated Multi‐Species scRNA‐seq Database for AI‐Ready |
| title_sort | sccompass an integrated multi species scrna seq database for ai ready |
| topic | AI‐ready multi‐species scRNA‐seq database single‐cell |
| url | https://doi.org/10.1002/advs.202500870 |
| work_keys_str_mv | AT pengfeiwang sccompassanintegratedmultispeciesscrnaseqdatabaseforaiready AT wenhaoliu sccompassanintegratedmultispeciesscrnaseqdatabaseforaiready AT jiajiawang sccompassanintegratedmultispeciesscrnaseqdatabaseforaiready AT yanaliu sccompassanintegratedmultispeciesscrnaseqdatabaseforaiready AT pengjiangli sccompassanintegratedmultispeciesscrnaseqdatabaseforaiready AT pingxu sccompassanintegratedmultispeciesscrnaseqdatabaseforaiready AT wentaocui sccompassanintegratedmultispeciesscrnaseqdatabaseforaiready AT ranzhang sccompassanintegratedmultispeciesscrnaseqdatabaseforaiready AT qingqinglong sccompassanintegratedmultispeciesscrnaseqdatabaseforaiready AT zhilonghu sccompassanintegratedmultispeciesscrnaseqdatabaseforaiready AT chenfang sccompassanintegratedmultispeciesscrnaseqdatabaseforaiready AT jingxidong sccompassanintegratedmultispeciesscrnaseqdatabaseforaiready AT chunyangzhang sccompassanintegratedmultispeciesscrnaseqdatabaseforaiready AT yanchen sccompassanintegratedmultispeciesscrnaseqdatabaseforaiready AT chengruiwang sccompassanintegratedmultispeciesscrnaseqdatabaseforaiready AT guoleliu sccompassanintegratedmultispeciesscrnaseqdatabaseforaiready AT hanyuxie sccompassanintegratedmultispeciesscrnaseqdatabaseforaiready AT yiyangzhang sccompassanintegratedmultispeciesscrnaseqdatabaseforaiready AT mengxiao sccompassanintegratedmultispeciesscrnaseqdatabaseforaiready AT shubaichen sccompassanintegratedmultispeciesscrnaseqdatabaseforaiready AT haipingjiang sccompassanintegratedmultispeciesscrnaseqdatabaseforaiready AT thexcompassconsortium sccompassanintegratedmultispeciesscrnaseqdatabaseforaiready AT yiqiangchen sccompassanintegratedmultispeciesscrnaseqdatabaseforaiready AT geyang sccompassanintegratedmultispeciesscrnaseqdatabaseforaiready AT shihuazhang sccompassanintegratedmultispeciesscrnaseqdatabaseforaiready AT zhenmeng sccompassanintegratedmultispeciesscrnaseqdatabaseforaiready AT xuezhiwang sccompassanintegratedmultispeciesscrnaseqdatabaseforaiready AT guihaifeng sccompassanintegratedmultispeciesscrnaseqdatabaseforaiready AT xinli sccompassanintegratedmultispeciesscrnaseqdatabaseforaiready AT yuanchunzhou sccompassanintegratedmultispeciesscrnaseqdatabaseforaiready |