Accurate Sum and Dot Product with New Instruction for High-Precision Computing on ARMv8 Processor

The accumulation of rounding errors can lead to unreliable results. Therefore, accurate and efficient algorithms are required. A processor from the ARMv8 architecture has introduced new instructions for high-precision computation. We have redesigned and implemented accurate summation and the accurat...

Full description

Saved in:
Bibliographic Details
Main Authors: Kaisen Xie, Qingfeng Lu, Hao Jiang, Hongxia Wang
Format: Article
Language:English
Published: MDPI AG 2025-01-01
Series:Mathematics
Subjects:
Online Access:https://www.mdpi.com/2227-7390/13/2/270
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832588086154887168
author Kaisen Xie
Qingfeng Lu
Hao Jiang
Hongxia Wang
author_facet Kaisen Xie
Qingfeng Lu
Hao Jiang
Hongxia Wang
author_sort Kaisen Xie
collection DOAJ
description The accumulation of rounding errors can lead to unreliable results. Therefore, accurate and efficient algorithms are required. A processor from the ARMv8 architecture has introduced new instructions for high-precision computation. We have redesigned and implemented accurate summation and the accurate dot product. The number of floating-point operations has been reduced from <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mn>7</mn><mi>n</mi><mo>−</mo><mn>5</mn></mrow></semantics></math></inline-formula> and <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mn>10</mn><mi>n</mi><mo>−</mo><mn>5</mn></mrow></semantics></math></inline-formula> to <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mn>4</mn><mi>n</mi><mo>−</mo><mn>2</mn></mrow></semantics></math></inline-formula> and <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mn>7</mn><mi>n</mi><mo>−</mo><mn>2</mn></mrow></semantics></math></inline-formula>, compared with the classic compensated precision algorithms. It has been proven that our accurate summation and dot algorithms’ error bounds are <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><msub><mi>γ</mi><mrow><mi>n</mi><mo>−</mo><mn>1</mn></mrow></msub><msub><mi>γ</mi><mi>n</mi></msub><mi>cond</mi><mo>+</mo><mi>u</mi></mrow></semantics></math></inline-formula> and <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><msub><mi>γ</mi><mi>n</mi></msub><msub><mi>γ</mi><mrow><mi>n</mi><mo>+</mo><mn>1</mn></mrow></msub><mi>cond</mi><mo>+</mo><mi>u</mi></mrow></semantics></math></inline-formula>, where ‘cond’ denotes the condition number, <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><msub><mi>γ</mi><mi>n</mi></msub><mo>=</mo><mi>n</mi><mo>·</mo><mi>u</mi><mo>/</mo><mrow><mo>(</mo><mn>1</mn><mo>−</mo><mi>n</mi><mo>·</mo><mi>u</mi><mo>)</mo></mrow></mrow></semantics></math></inline-formula>, and <i>u</i> denotes the relative rounding error unit. Our accurate summation and dot product achieved a 1.69× speedup and a 1.14× speedup, respectively, on a simulation platform. Numerical experiments also illustrate that, under round-towards-zero mode, our algorithms are as accurate as the classic compensated precision algorithms.
format Article
id doaj-art-bc40195db12741f183316ed0dd072240
institution Kabale University
issn 2227-7390
language English
publishDate 2025-01-01
publisher MDPI AG
record_format Article
series Mathematics
spelling doaj-art-bc40195db12741f183316ed0dd0722402025-01-24T13:39:58ZengMDPI AGMathematics2227-73902025-01-0113227010.3390/math13020270Accurate Sum and Dot Product with New Instruction for High-Precision Computing on ARMv8 ProcessorKaisen Xie0Qingfeng Lu1Hao Jiang2Hongxia Wang3College of Computer Science and Technology, National University of Defense Technology, Changsha 410073, ChinaCollege of Computer Science and Technology, National University of Defense Technology, Changsha 410073, ChinaCollege of Computer Science and Technology, National University of Defense Technology, Changsha 410073, ChinaCollege of Science, National University of Defense Technology, Changsha 410073, ChinaThe accumulation of rounding errors can lead to unreliable results. Therefore, accurate and efficient algorithms are required. A processor from the ARMv8 architecture has introduced new instructions for high-precision computation. We have redesigned and implemented accurate summation and the accurate dot product. The number of floating-point operations has been reduced from <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mn>7</mn><mi>n</mi><mo>−</mo><mn>5</mn></mrow></semantics></math></inline-formula> and <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mn>10</mn><mi>n</mi><mo>−</mo><mn>5</mn></mrow></semantics></math></inline-formula> to <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mn>4</mn><mi>n</mi><mo>−</mo><mn>2</mn></mrow></semantics></math></inline-formula> and <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mn>7</mn><mi>n</mi><mo>−</mo><mn>2</mn></mrow></semantics></math></inline-formula>, compared with the classic compensated precision algorithms. It has been proven that our accurate summation and dot algorithms’ error bounds are <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><msub><mi>γ</mi><mrow><mi>n</mi><mo>−</mo><mn>1</mn></mrow></msub><msub><mi>γ</mi><mi>n</mi></msub><mi>cond</mi><mo>+</mo><mi>u</mi></mrow></semantics></math></inline-formula> and <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><msub><mi>γ</mi><mi>n</mi></msub><msub><mi>γ</mi><mrow><mi>n</mi><mo>+</mo><mn>1</mn></mrow></msub><mi>cond</mi><mo>+</mo><mi>u</mi></mrow></semantics></math></inline-formula>, where ‘cond’ denotes the condition number, <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><msub><mi>γ</mi><mi>n</mi></msub><mo>=</mo><mi>n</mi><mo>·</mo><mi>u</mi><mo>/</mo><mrow><mo>(</mo><mn>1</mn><mo>−</mo><mi>n</mi><mo>·</mo><mi>u</mi><mo>)</mo></mrow></mrow></semantics></math></inline-formula>, and <i>u</i> denotes the relative rounding error unit. Our accurate summation and dot product achieved a 1.69× speedup and a 1.14× speedup, respectively, on a simulation platform. Numerical experiments also illustrate that, under round-towards-zero mode, our algorithms are as accurate as the classic compensated precision algorithms.https://www.mdpi.com/2227-7390/13/2/270compensated precisionaccurate summationaccurate dot producterror analysiserror-free transformation
spellingShingle Kaisen Xie
Qingfeng Lu
Hao Jiang
Hongxia Wang
Accurate Sum and Dot Product with New Instruction for High-Precision Computing on ARMv8 Processor
Mathematics
compensated precision
accurate summation
accurate dot product
error analysis
error-free transformation
title Accurate Sum and Dot Product with New Instruction for High-Precision Computing on ARMv8 Processor
title_full Accurate Sum and Dot Product with New Instruction for High-Precision Computing on ARMv8 Processor
title_fullStr Accurate Sum and Dot Product with New Instruction for High-Precision Computing on ARMv8 Processor
title_full_unstemmed Accurate Sum and Dot Product with New Instruction for High-Precision Computing on ARMv8 Processor
title_short Accurate Sum and Dot Product with New Instruction for High-Precision Computing on ARMv8 Processor
title_sort accurate sum and dot product with new instruction for high precision computing on armv8 processor
topic compensated precision
accurate summation
accurate dot product
error analysis
error-free transformation
url https://www.mdpi.com/2227-7390/13/2/270
work_keys_str_mv AT kaisenxie accuratesumanddotproductwithnewinstructionforhighprecisioncomputingonarmv8processor
AT qingfenglu accuratesumanddotproductwithnewinstructionforhighprecisioncomputingonarmv8processor
AT haojiang accuratesumanddotproductwithnewinstructionforhighprecisioncomputingonarmv8processor
AT hongxiawang accuratesumanddotproductwithnewinstructionforhighprecisioncomputingonarmv8processor