ResDecode: Accelerating Large Language Models Inference via Residual Decoding Heads

ResDecode: Accelerating Large Language Models Inference via Residual Decoding Heads

Large language Models (LLMs) have immense potential to enhance the capabilities of Cyber-Physical-Social Intelligence (CPSI) systems, enabling them to better engage with complex cyber, physical, and social environments. However, the high inference latency of LLMs, which is inherited from the autoreg...

Full description

Saved in:

Bibliographic Details
Main Authors:	Ziqian Zeng, Jiahong Yu, Qianshi Pang, Zihao Wang, Huiping Zhuang, Fan Yu, Hongen Shao, Xiaofeng Zou
Format:	Article
Language:	English
Published:	Tsinghua University Press 2025-06-01
Series:	Big Data Mining and Analytics
Subjects:	speculative decoding efficient inference large language models (llms)
Online Access:	https://www.sciopen.com/article/10.26599/BDMA.2024.9020074
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Accelerating the inference of string generation-based chemical reaction models for industrial applications
by: Mikhail Andronov, et al.
Published: (2025-03-01)

Recursive Bayesian Decoding in State Observation Models: Theory and Application in Quantum-Based Inference
by: Branislav Rudić, et al.
Published: (2025-06-01)

BALI—A Benchmark for Accelerated Language Model Inference
by: Lena Jurkschat, et al.
Published: (2025-01-01)

SSCANL decoder based joint iterative detection and decoding algorithm
by: Chongyang LIU, et al.
Published: (2022-10-01)

SSCANL decoder based joint iterative detection and decoding algorithm
by: Chongyang LIU, et al.
Published: (2022-10-01)

Improving Windowed Decoding of SC LDPC Codes by Effective Decoding Termination, Message Reuse, and Amplification
by: Inayat Ali, et al.
Published: (2018-01-01)

Adaptive channel decoding method for polar codes
by: YE Maolin, et al.
Published: (2022-09-01)

Improved segmented CRC assisted puncturing Polar decoding
by: Yanhong NI, et al.
Published: (2019-03-01)

Nonbinary polar coding with low decoding latency and complexity
by: Peiyao Chen, et al.
Published: (2023-05-01)

Research of fast decoding for longer constraint length convolutional codes
by: HUANG Xiao-ling, et al.
Published: (2010-01-01)

Research of fast decoding for longer constraint length convolutional codes
by: HUANG Xiao-ling, et al.
Published: (2010-01-01)

Novel low-delay scheme for parallel Turbo decoding
by: REN De-feng, et al.
Published: (2011-01-01)

Encoding and Decoding of Secret Messages Using Matrices
by: James, Niwagaba
Published: (2022)

Encoding and Decoding of Secret Messages Using Matrices.
by: Niwagaba, James
Published: (2023)

Decoding of periodontal screening and recording index
by: Dler Ali Kursheed, et al.
Published: (2021-07-01)

Low Complexity Unquantized Forward Stack Decoding Algorithm for Spinal Codes in Measurement While Drilling Communication
by: Xiaoyang Yu, et al.
Published: (2025-01-01)

Next-Gen Decoding: Non-Binary LDPC Algorithms for Emerging Power Line and Visible Light Communications
by: Waheed Ullah, et al.
Published: (2025-01-01)

Novel decoding of convolutional codes for OCDMA system
by: ZHOU Hai-xian, et al.
Published: (2009-01-01)

Novel decoding of convolutional codes for OCDMA system
by: ZHOU Hai-xian, et al.
Published: (2009-01-01)

Research on multi-bit decoding algorithms for polar codes
by: Zhouqing SHEN, et al.
Published: (2018-11-01)

Research on multi-bit decoding algorithms for polar codes
by: Zhouqing SHEN, et al.
Published: (2018-11-01)

Method of MVB and WTB Frames Decoding Based on LabVIEW
by: WU Yun, et al.
Published: (2013-01-01)

Coder and decoder of fractal signals of comb-type structure
by: R. L. Politanskyi, et al.
Published: (2014-08-01)

Co-decode algorithm of network coding with hardware logic
by: Hui LI, et al.
Published: (2012-07-01)

Maximum Likelihood Decoder for Variable Length Codes
by: Syed Misbahuddin, et al.
Published: (2012-12-01)

Quasi-Optimal Path Convergence-Aided Automorphism Ensemble Decoding of Reed–Muller Codes
by: Kairui Tian, et al.
Published: (2025-04-01)

Decoding of the Spatial Distribution of Ionizing Radiation Sources in Systems with Coded Apertures
by: A.A. Nikuliak
Published: (2012-11-01)

Gender and Accuracy in Decoding Affect Cues: A Meta-Analysis
by: Judith A. Hall, et al.
Published: (2025-03-01)

Optimization design and simplified decoding for short frame CCPM
by: Xiao-jie DAI, et al.
Published: (2013-05-01)

Study on the SOVA decoding algorithm for Turbo codes based on modified path-metric
by: LIU Xing-cheng, et al.
Published: (2008-01-01)

Study on the SOVA decoding algorithm for Turbo codes based on modified path-metric
by: LIU Xing-cheng, et al.
Published: (2008-01-01)

Algorithms for Majority Decoding of Group Codes
by: V. M. Deundyak, et al.
Published: (2015-08-01)

GC-Like LDPC Code Construction and its NN-Aided Decoder Implementation
by: Yu-Lun Hsu, et al.
Published: (2024-01-01)

Improved Polar selective decoding and forwarding cooperation
by: Qichao SUN, et al.
Published: (2020-01-01)

Auditory EEG Decoding Challenge for ICASSP 2024
by: Lies Bollens, et al.
Published: (2025-01-01)

Scalable FPGA Implementation of a Reliability-Based Direct Turbo Decoder for Short Block Codes
by: Senthil Murugan, et al.
Published: (2025-01-01)

Electrophysiological decoding captures the temporal trajectory of face categorization in infants
by: Roman Kessler, et al.
Published: (2025-10-01)

Improved sphere decoding algorithm based on ±1 quadratic programming
by: LI Zi1, et al.
Published: (2007-01-01)

New ternary decoders using hybrid memristor-MOS logic
by: Ramesh Kumar, et al.
Published: (2025-06-01)

A speech recognition method with enhanced transformer decoder
by: Hengbo Hu, et al.
Published: (2025-02-01)