bert
Tags: nlp
-
attention vs self-attention
-
attention is an operator that lets you collapse a sequence of vectors into a single vector
-
self attention is used when the vectors you’re attending over were produced by a nn with the same params and the same input sequence
-