CMAB: A Multi-Attribute Building Dataset of China

Abstract Rapidly acquiring three-dimensional (3D) building data, including geometric attributes like rooftop, height and orientations, as well as indicative attributes like function, quality, and age, is essential for accurate urban analysis, simulations, and policy updates. Current building dataset...

Full description

Saved in:
Bibliographic Details
Main Authors: Yecheng Zhang, Huimin Zhao, Ying Long
Format: Article
Language:English
Published: Nature Portfolio 2025-03-01
Series:Scientific Data
Online Access:https://doi.org/10.1038/s41597-025-04730-5
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850039813517869056
author Yecheng Zhang
Huimin Zhao
Ying Long
author_facet Yecheng Zhang
Huimin Zhao
Ying Long
author_sort Yecheng Zhang
collection DOAJ
description Abstract Rapidly acquiring three-dimensional (3D) building data, including geometric attributes like rooftop, height and orientations, as well as indicative attributes like function, quality, and age, is essential for accurate urban analysis, simulations, and policy updates. Current building datasets suffer from incomplete coverage of building multi-attributes. This paper presents the first national-scale Multi-Attribute Building dataset (CMAB) with artificial intelligence, covering 3,667 spatial cities, 31 million buildings, and 23.6 billion m² of rooftops with an F1-Score of 89.93% in OCRNet-based extraction, totaling 363 billion m³ of building stock. We trained bootstrap aggregated XGBoost models with city administrative classifications, incorporating morphology, location, and function features. Using multi-source data, including billions of remote sensing images and 60 million street view images (SVIs), we generated rooftop, height, structure, function, style, age, and quality attributes for each building with machine learning and large multimodal models. Accuracy was validated through model benchmarks, existing similar products, and manual SVI validation, mostly above 80%. Our dataset and results are crucial for global SDGs and urban planning.
format Article
id doaj-art-8f50e744d8ef4d37b6b38b8f9dda06f5
institution DOAJ
issn 2052-4463
language English
publishDate 2025-03-01
publisher Nature Portfolio
record_format Article
series Scientific Data
spelling doaj-art-8f50e744d8ef4d37b6b38b8f9dda06f52025-08-20T02:56:14ZengNature PortfolioScientific Data2052-44632025-03-0112111810.1038/s41597-025-04730-5CMAB: A Multi-Attribute Building Dataset of ChinaYecheng Zhang0Huimin Zhao1Ying Long2School of Architecture, Tsinghua UniversitySchool of Architecture, Tsinghua UniversitySchool of Architecture, Tsinghua UniversityAbstract Rapidly acquiring three-dimensional (3D) building data, including geometric attributes like rooftop, height and orientations, as well as indicative attributes like function, quality, and age, is essential for accurate urban analysis, simulations, and policy updates. Current building datasets suffer from incomplete coverage of building multi-attributes. This paper presents the first national-scale Multi-Attribute Building dataset (CMAB) with artificial intelligence, covering 3,667 spatial cities, 31 million buildings, and 23.6 billion m² of rooftops with an F1-Score of 89.93% in OCRNet-based extraction, totaling 363 billion m³ of building stock. We trained bootstrap aggregated XGBoost models with city administrative classifications, incorporating morphology, location, and function features. Using multi-source data, including billions of remote sensing images and 60 million street view images (SVIs), we generated rooftop, height, structure, function, style, age, and quality attributes for each building with machine learning and large multimodal models. Accuracy was validated through model benchmarks, existing similar products, and manual SVI validation, mostly above 80%. Our dataset and results are crucial for global SDGs and urban planning.https://doi.org/10.1038/s41597-025-04730-5
spellingShingle Yecheng Zhang
Huimin Zhao
Ying Long
CMAB: A Multi-Attribute Building Dataset of China
Scientific Data
title CMAB: A Multi-Attribute Building Dataset of China
title_full CMAB: A Multi-Attribute Building Dataset of China
title_fullStr CMAB: A Multi-Attribute Building Dataset of China
title_full_unstemmed CMAB: A Multi-Attribute Building Dataset of China
title_short CMAB: A Multi-Attribute Building Dataset of China
title_sort cmab a multi attribute building dataset of china
url https://doi.org/10.1038/s41597-025-04730-5
work_keys_str_mv AT yechengzhang cmabamultiattributebuildingdatasetofchina
AT huiminzhao cmabamultiattributebuildingdatasetofchina
AT yinglong cmabamultiattributebuildingdatasetofchina