LeCun, Bengio, Hinton  Deep Learning
LeCun, Yann, Yoshua Bengio, and Geoffrey Hinton. “Deep Learning.” Nature 521, no. 7553 (May 2015): 436–44. https://doi.org/10.1038/nature14539.
Summary
Deep learning uses multiple layers to learn representations of data with multiple levels of abstraction
Supervised Learning

most common

show inputs and annotations

create an objective function that measures errors

most people use stochastic gradient descent

known since 1960 that linear classifers only carve their input space into very simple regions (half spaces separated by a hyperplane)

multiple nonlinear layers from 520, system can implement extremely intricate functions
Backprops for multilayer
 the derivative of the objective with respect to the input of a module can be computed by working backwards from the gardient wrt to the output of that module
 ReLu maps fixedsize input to fixedsize output
 late 1990’s commonly thought that simple gradient descent would get caught in local minima
 rarely a problem for gradient descent in practice