ResDecode: Accelerating Large Language Models Inference via Residual Decoding Heads
Large language Models (LLMs) have immense potential to enhance the capabilities of Cyber-Physical-Social Intelligence (CPSI) systems, enabling them to better engage with complex cyber, physical, and social environments. However, the high inference latency of LLMs, which is inherited from the autoreg...
Saved in:
| Main Authors: | Ziqian Zeng, Jiahong Yu, Qianshi Pang, Zihao Wang, Huiping Zhuang, Fan Yu, Hongen Shao, Xiaofeng Zou |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Tsinghua University Press
2025-06-01
|
| Series: | Big Data Mining and Analytics |
| Subjects: | |
| Online Access: | https://www.sciopen.com/article/10.26599/BDMA.2024.9020074 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
-
Accelerating the inference of string generation-based chemical reaction models for industrial applications
by: Mikhail Andronov, et al.
Published: (2025-03-01) -
Recursive Bayesian Decoding in State Observation Models: Theory and Application in Quantum-Based Inference
by: Branislav Rudić, et al.
Published: (2025-06-01) -
BALI—A Benchmark for Accelerated Language Model Inference
by: Lena Jurkschat, et al.
Published: (2025-01-01) -
SSCANL decoder based joint iterative detection and decoding algorithm
by: Chongyang LIU, et al.
Published: (2022-10-01) -
SSCANL decoder based joint iterative detection and decoding algorithm
by: Chongyang LIU, et al.
Published: (2022-10-01)