CQS-Attention: Scaling Up the Standard Attention Computation for Infinitely Long Sequences

Transformer models suffer from unaffordable high memory consumption when the sequence is long and standard self-attention is utilized. We developed a sequence parallelism scheme called CQS-Attention that can break the limit of sequence length. A long sequence is divided into multiple overlapping sub...

Full description

Saved in:

Bibliographic Details
Main Authors:	Yiming Bian, Arun K. Somani
Format:	Article
Language:	English
Published:	IEEE 2025-01-01
Series:	IEEE Access
Subjects:	Attention computation cyclic quorum sets parallel algorithm transformer
Online Access:	https://ieeexplore.ieee.org/document/10900388/
Tags:	Add Tag No Tags, Be the first to tag this record!

Be the first to leave a comment!

CQS-Attention: Scaling Up the Standard Attention Computation for Infinitely Long Sequences

Similar Items