CMAB: A Multi-Attribute Building Dataset of China
Abstract Rapidly acquiring three-dimensional (3D) building data, including geometric attributes like rooftop, height and orientations, as well as indicative attributes like function, quality, and age, is essential for accurate urban analysis, simulations, and policy updates. Current building dataset...
Saved in:
| Main Authors: | , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Nature Portfolio
2025-03-01
|
| Series: | Scientific Data |
| Online Access: | https://doi.org/10.1038/s41597-025-04730-5 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1850039813517869056 |
|---|---|
| author | Yecheng Zhang Huimin Zhao Ying Long |
| author_facet | Yecheng Zhang Huimin Zhao Ying Long |
| author_sort | Yecheng Zhang |
| collection | DOAJ |
| description | Abstract Rapidly acquiring three-dimensional (3D) building data, including geometric attributes like rooftop, height and orientations, as well as indicative attributes like function, quality, and age, is essential for accurate urban analysis, simulations, and policy updates. Current building datasets suffer from incomplete coverage of building multi-attributes. This paper presents the first national-scale Multi-Attribute Building dataset (CMAB) with artificial intelligence, covering 3,667 spatial cities, 31 million buildings, and 23.6 billion m² of rooftops with an F1-Score of 89.93% in OCRNet-based extraction, totaling 363 billion m³ of building stock. We trained bootstrap aggregated XGBoost models with city administrative classifications, incorporating morphology, location, and function features. Using multi-source data, including billions of remote sensing images and 60 million street view images (SVIs), we generated rooftop, height, structure, function, style, age, and quality attributes for each building with machine learning and large multimodal models. Accuracy was validated through model benchmarks, existing similar products, and manual SVI validation, mostly above 80%. Our dataset and results are crucial for global SDGs and urban planning. |
| format | Article |
| id | doaj-art-8f50e744d8ef4d37b6b38b8f9dda06f5 |
| institution | DOAJ |
| issn | 2052-4463 |
| language | English |
| publishDate | 2025-03-01 |
| publisher | Nature Portfolio |
| record_format | Article |
| series | Scientific Data |
| spelling | doaj-art-8f50e744d8ef4d37b6b38b8f9dda06f52025-08-20T02:56:14ZengNature PortfolioScientific Data2052-44632025-03-0112111810.1038/s41597-025-04730-5CMAB: A Multi-Attribute Building Dataset of ChinaYecheng Zhang0Huimin Zhao1Ying Long2School of Architecture, Tsinghua UniversitySchool of Architecture, Tsinghua UniversitySchool of Architecture, Tsinghua UniversityAbstract Rapidly acquiring three-dimensional (3D) building data, including geometric attributes like rooftop, height and orientations, as well as indicative attributes like function, quality, and age, is essential for accurate urban analysis, simulations, and policy updates. Current building datasets suffer from incomplete coverage of building multi-attributes. This paper presents the first national-scale Multi-Attribute Building dataset (CMAB) with artificial intelligence, covering 3,667 spatial cities, 31 million buildings, and 23.6 billion m² of rooftops with an F1-Score of 89.93% in OCRNet-based extraction, totaling 363 billion m³ of building stock. We trained bootstrap aggregated XGBoost models with city administrative classifications, incorporating morphology, location, and function features. Using multi-source data, including billions of remote sensing images and 60 million street view images (SVIs), we generated rooftop, height, structure, function, style, age, and quality attributes for each building with machine learning and large multimodal models. Accuracy was validated through model benchmarks, existing similar products, and manual SVI validation, mostly above 80%. Our dataset and results are crucial for global SDGs and urban planning.https://doi.org/10.1038/s41597-025-04730-5 |
| spellingShingle | Yecheng Zhang Huimin Zhao Ying Long CMAB: A Multi-Attribute Building Dataset of China Scientific Data |
| title | CMAB: A Multi-Attribute Building Dataset of China |
| title_full | CMAB: A Multi-Attribute Building Dataset of China |
| title_fullStr | CMAB: A Multi-Attribute Building Dataset of China |
| title_full_unstemmed | CMAB: A Multi-Attribute Building Dataset of China |
| title_short | CMAB: A Multi-Attribute Building Dataset of China |
| title_sort | cmab a multi attribute building dataset of china |
| url | https://doi.org/10.1038/s41597-025-04730-5 |
| work_keys_str_mv | AT yechengzhang cmabamultiattributebuildingdatasetofchina AT huiminzhao cmabamultiattributebuildingdatasetofchina AT yinglong cmabamultiattributebuildingdatasetofchina |