### عجفت الغور

# structured space modeling

- sequential image classification
- stuff that turns it into sequences of length 1024

- keyword spotting
- 1s audio clips to be classified into clips
- speech is difficult b/c of high sampling rate
- 1s clips - 16k
- mfcc coefficients
- filter banks, reduces sequences by 100x

- irregular continous data
- missing values of diff frequencies

- long range arena
- A matricies are construed to be stable, or whether stable matricies lead to zero final state
- do all matricies derive from the Hippo matricy?
- close to unitary, are the eigenvalues controlled, so that it doesn’t blow up?
- stability comes from matrix itself, but comes from the discritization
- by fiat of memorization, it has to be stable, it has to derived to memorize, and must be stable to memorize

- not every stable matrix will do well
- randomly initalized A matrix won’t work
- actual randomization leads to NaN out
- control the eigenvalues to be close to 1
- becomes stable enough to train
- performs well when closely tied to hippo

- why not regular benchmarks
- what are the limitations
- text may actually be more difficult
- text is very discrete, and no idea on how well it would work

- narrativeQA?
- question/answering from long range tasks?

- theory of structured matrixes
- algorithmic lin alg/structured matricies

- theory of memorization with hippo
- other approximations
- hippo is a family that’s not fully understood

- can hippo be learned via EM?

- high level intuitions
- why does it do better?
- not forgetting things? more complex things?
- what does learning more complex things mean?
- constructed to not forget
- long referential sequences, long context sequences
- tradeoffs? discrete and shorter data -> maybe transformers are better
- maybe CNN’s are more efficient
- competitive with RNN’s and CNN’s across the board

- testing with synthetic datasets?

- these A/B/C/D matricies “biases” the model towards memorization
- A/B are initalized by the theory -> memorization
- C/D are more general deep learning params -> exploiting what it’s memorized

- previous work: https://arxiv.org/abs/1611.01569