عجفت الغور

clustering

ml

NVDM

  • varational autoencoder with BOW inputs
  • Word order is ignored, only word counts matter
  • L1 normalized counts: word probability distribution

Transformer

  • pretrained LM, fine tuned on some tasks
  • compute loss with MSE or contrastive
    • contrastive learning gives anchor, and a negative one that’s dissimilar

RoBERTa

  • masked langugage model

MPNet

  • Combines both approaches of permuted language models and masked language models
  • Sequence is permuted and last tokens are masked

MiniLM

  • distillation
  • Teacher model teaches a student model
  • all-mpnet-base-v2 teacher
  • all-miniLM-l6-v2 is 5 times faster

Benchmark Dataset

Multi-News

  • summary dataset that’s has custom human written summaries

Metrics

Accuracy

  • Cosine similarity
  • each news story should be closer to its summary than any other summary
  • Use AUC to determine how good is the classifier

Speed

  • NVDM
    • is NVDM actually fast? Tested on batch sizes
    • NVDM is actually not that fast, for small batch sizes is pretty slow, only catches up much later