Caliskan et al: Semantics derived automatically from language corpora contain human-like biases

Summary

The researchers showed that using biases are inherent to the statistical frequency of specific words - distributional hypothesis in linguistics
They developed a method similar to the Implict Association Test called Word Embedding Association Test, which measures the cosine similarity of specific words.
Largely a study that shows that all the stereotypes and biases show from the IAT are apart of word embeddings, meaning that bias is not necessarily escapable here by tuning the dataset
Note that this only captured word embeddings and only captured Glove.
- Likely, if not certain, that the same biases are in fasttext or w2v
Paper was from 2017, so no mention of sentence embeddings
Relates specifically to the work done by Bolukbasi et al who proposed a method to debias word embeddings (T. Bolukbasi, K.-W. Chang, J. Y. Zou, V. Saligrama, A. T. Kalai, Adv. Neural Inf. Process. Syst. 2016, 4349–4357 (2016).)