عجفت الغور

scaling laws survey paper (information bottleneck, minimum description length, etc)

scaling law seminar

Angle: all three measure generalization in some way

  • can we match these together?
  • MI is the most backed, but has issues with high dimensions and hasn’t been shown on transformers
    • (also IB may be just the existence of geometric compression)
  • norm growth may have X
  • “multiple descent”
  • do these all criticall validate each other?