scaling laws survey paper (information bottleneck, minimum description length, etc)
Angle: all three measure generalization in some way
- can we match these together?
- MI is the most backed, but has issues with high dimensions and hasn’t been shown on transformers
- (also IB may be just the existence of geometric compression)
- norm growth may have X
- “multiple descent”
- do these all criticall validate each other?