LeCun, Bengio, Hinton - Deep Learning
LeCun, Yann, Yoshua Bengio, and Geoffrey Hinton. “Deep Learning.” Nature 521, no. 7553 (May 2015): 436–44. https://doi.org/10.1038/nature14539.
Deep learning uses multiple layers to learn representations of data with multiple levels of abstraction
Supervised Learning
most common
show inputs and annotations
create an objective function that measures errors
most people use stochastic gradient descent
known since 1960 that linear classifers only carve their input space into very simple regions (half spaces separated by a hyperplane)
multiple non-linear layers from 5-20, system can implement extremely intricate functions
Backprops for multilayer
- the derivative of the objective with respect to the input of a module can be computed by working backwards from the gardient wrt to the output of that module
- ReLu maps fixed-size input to fixed-size output
- late 1990’s commonly thought that simple gradient descent would get caught in local minima
- rarely a problem for gradient descent in practice