bert
Tags: nlp
- 
attention vs self-attention - 
attention is an operator that lets you collapse a sequence of vectors into a single vector 
- 
self attention is used when the vectors you’re attending over were produced by a nn with the same params and the same input sequence
 
-