Addressing Activation Outliers in LLMs: A Systematic Review of Post-Training Quantization Techniques

Large Language Models (LLMs) have transformed natural language processing, yet their deployment remains challenging due to substantial computational, memory, and energy demands. Post-training quantization has emerged as a key strategy for enabling efficient inference, particularly in resource-constr...

Full description

Saved in:
Bibliographic Details
Main Authors: Patrik Czako, Gabor Kertesz, Sandor Szenasi
Format: Article
Language:English
Published: IEEE 2025-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10994764/
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Large Language Models (LLMs) have transformed natural language processing, yet their deployment remains challenging due to substantial computational, memory, and energy demands. Post-training quantization has emerged as a key strategy for enabling efficient inference, particularly in resource-constrained settings. This systematic review focuses on weight-activation quantization, with a unique emphasis on the emergent outlier phenomenon in LLM activations. This work evaluates recent techniques that mitigate activation outliers and improve quantization efficiency, distinguishing itself from prior reviews. Using the PRISMA methodology, we examine 52 recent studies to uncover key trends and evaluate the effectiveness of different approaches. By synthesizing insights from these works, this review presents a diverse set of techniques and their implications for activation quantization, laying the groundwork for future research and practical advancements in LLM deployment.
ISSN:2169-3536