Katharopoulos: Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention

Katharopoulos, Angelos, Apoorv Vyas, Nikolaos Pappas, and François Fleuret. “Transformers Are RNNs: Fast Autoregressive Transformers with Linear Attention.” ArXiv:2006.16236 [Cs, Stat], August 31, 2020. http://arxiv.org/abs/2006.16236.

Expressing the self-attention as a linear dot-product of kernel feature maps to make the transformer linear time