Refactoring BZIP2 on the new‐generation sunway supercomputer

Abstract High‐performance computing is progressively assuming a fundamental role in advancing scientific research and engineering domains. However, the ever‐expanding scales of scientific simulations pose challenges for efficient data I/O and storage. The data compression technology has garnered sig...

Full description

Saved in:
Bibliographic Details
Main Authors: Xiaohui Liu, Zekun Yin, Haodong Tian, Wubing Wan, Mengyuan Hua, Wenlai Zhao, Zhenchun Huang, Ping Gao, Fangjin Zhu, Hua Wang, Xiaohui Duan
Format: Article
Language:English
Published: Wiley 2025-01-01
Series:Engineering Reports
Subjects:
Online Access:https://doi.org/10.1002/eng2.12806
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832576652139298816
author Xiaohui Liu
Zekun Yin
Haodong Tian
Wubing Wan
Mengyuan Hua
Wenlai Zhao
Zhenchun Huang
Ping Gao
Fangjin Zhu
Hua Wang
Xiaohui Duan
author_facet Xiaohui Liu
Zekun Yin
Haodong Tian
Wubing Wan
Mengyuan Hua
Wenlai Zhao
Zhenchun Huang
Ping Gao
Fangjin Zhu
Hua Wang
Xiaohui Duan
author_sort Xiaohui Liu
collection DOAJ
description Abstract High‐performance computing is progressively assuming a fundamental role in advancing scientific research and engineering domains. However, the ever‐expanding scales of scientific simulations pose challenges for efficient data I/O and storage. The data compression technology has garnered significant attention as a solution to reduce data transmission and storage costs while enhancing performance. In particular, the BZIP2 lossless compression algorithm has been widely used due to its exceptional compression ratio, moderate compression speed, high reliability, and open‐source nature. This paper focuses on the design and realization of a parallelized BZIP2 algorithm tailored for deployment on the New‐Generation Sunway supercomputing platform. By leveraging the unique cache patterns of the New‐Generation Sunway processor, we propose the highly tuned multi‐threading and multi‐node implementations of the BZIP2 applications for different scenarios. Moreover, we also propose the efficient BZIP2 libraries based on the management processing element and computing processing element which support the commonly used high‐level (de)compression interfaces. The test results indicate that the our multi‐threading implementation achieves maximum speedup of 23.09× (8.57×) in decompression(compression) compared to the sequential implementation. Furthermore, the multi‐node implementation achieves 50.81% (26.35%) parallel efficiency and peak performance of 16.6 GB/s (52.8 GB/s) for compression(decompression) when scaling up to 2048 processes.
format Article
id doaj-art-de6e170294c7420d8b80601b1451f02d
institution Kabale University
issn 2577-8196
language English
publishDate 2025-01-01
publisher Wiley
record_format Article
series Engineering Reports
spelling doaj-art-de6e170294c7420d8b80601b1451f02d2025-01-31T00:22:48ZengWileyEngineering Reports2577-81962025-01-0171n/an/a10.1002/eng2.12806Refactoring BZIP2 on the new‐generation sunway supercomputerXiaohui Liu0Zekun Yin1Haodong Tian2Wubing Wan3Mengyuan Hua4Wenlai Zhao5Zhenchun Huang6Ping Gao7Fangjin Zhu8Hua Wang9Xiaohui Duan10School of Software Shandong University Jinan ChinaSchool of Software Shandong University Jinan ChinaSchool of Software Shandong University Jinan ChinaNational Supercomputing Center in Wuxi Wuxi ChinaSchool of Software Shandong University Jinan ChinaNational Supercomputing Center in Wuxi Wuxi ChinaDepartment of Computer Science and Technology Tsinghua University Beijing ChinaNational Supercomputing Center in Wuxi Wuxi ChinaSchool of Software Shandong University Jinan ChinaSchool of Software Shandong University Jinan ChinaSchool of Software Shandong University Jinan ChinaAbstract High‐performance computing is progressively assuming a fundamental role in advancing scientific research and engineering domains. However, the ever‐expanding scales of scientific simulations pose challenges for efficient data I/O and storage. The data compression technology has garnered significant attention as a solution to reduce data transmission and storage costs while enhancing performance. In particular, the BZIP2 lossless compression algorithm has been widely used due to its exceptional compression ratio, moderate compression speed, high reliability, and open‐source nature. This paper focuses on the design and realization of a parallelized BZIP2 algorithm tailored for deployment on the New‐Generation Sunway supercomputing platform. By leveraging the unique cache patterns of the New‐Generation Sunway processor, we propose the highly tuned multi‐threading and multi‐node implementations of the BZIP2 applications for different scenarios. Moreover, we also propose the efficient BZIP2 libraries based on the management processing element and computing processing element which support the commonly used high‐level (de)compression interfaces. The test results indicate that the our multi‐threading implementation achieves maximum speedup of 23.09× (8.57×) in decompression(compression) compared to the sequential implementation. Furthermore, the multi‐node implementation achieves 50.81% (26.35%) parallel efficiency and peak performance of 16.6 GB/s (52.8 GB/s) for compression(decompression) when scaling up to 2048 processes.https://doi.org/10.1002/eng2.12806BZIP2lossless compressionparallel computingthe new‐generation Sunway
spellingShingle Xiaohui Liu
Zekun Yin
Haodong Tian
Wubing Wan
Mengyuan Hua
Wenlai Zhao
Zhenchun Huang
Ping Gao
Fangjin Zhu
Hua Wang
Xiaohui Duan
Refactoring BZIP2 on the new‐generation sunway supercomputer
Engineering Reports
BZIP2
lossless compression
parallel computing
the new‐generation Sunway
title Refactoring BZIP2 on the new‐generation sunway supercomputer
title_full Refactoring BZIP2 on the new‐generation sunway supercomputer
title_fullStr Refactoring BZIP2 on the new‐generation sunway supercomputer
title_full_unstemmed Refactoring BZIP2 on the new‐generation sunway supercomputer
title_short Refactoring BZIP2 on the new‐generation sunway supercomputer
title_sort refactoring bzip2 on the new generation sunway supercomputer
topic BZIP2
lossless compression
parallel computing
the new‐generation Sunway
url https://doi.org/10.1002/eng2.12806
work_keys_str_mv AT xiaohuiliu refactoringbzip2onthenewgenerationsunwaysupercomputer
AT zekunyin refactoringbzip2onthenewgenerationsunwaysupercomputer
AT haodongtian refactoringbzip2onthenewgenerationsunwaysupercomputer
AT wubingwan refactoringbzip2onthenewgenerationsunwaysupercomputer
AT mengyuanhua refactoringbzip2onthenewgenerationsunwaysupercomputer
AT wenlaizhao refactoringbzip2onthenewgenerationsunwaysupercomputer
AT zhenchunhuang refactoringbzip2onthenewgenerationsunwaysupercomputer
AT pinggao refactoringbzip2onthenewgenerationsunwaysupercomputer
AT fangjinzhu refactoringbzip2onthenewgenerationsunwaysupercomputer
AT huawang refactoringbzip2onthenewgenerationsunwaysupercomputer
AT xiaohuiduan refactoringbzip2onthenewgenerationsunwaysupercomputer