عجفت الغور

truthfulqa

datasets, scaling law seminar

Adversarial dataset for answering questions, specifically relating to whether models generate good answers to questions

  • interesting finding of this paper is that they found larger models tended to generate more false answers
    • possibly because larger models are better at imitating falsehoods, or they make more human like (but false) generalizations