CQS-Attention: Scaling Up the Standard Attention Computation for Infinitely Long Sequences

Transformer models suffer from unaffordable high memory consumption when the sequence is long and standard self-attention is utilized. We developed a sequence parallelism scheme called CQS-Attention that can break the limit of sequence length. A long sequence is divided into multiple overlapping sub...

Full description

Saved in:
Bibliographic Details
Main Authors: Yiming Bian, Arun K. Somani
Format: Article
Language:English
Published: IEEE 2025-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10900388/
Tags: Add Tag
No Tags, Be the first to tag this record!

Similar Items