Understanding deep learning requires rethinking generalization
On this page
Despite their massive size, successful deep artificial neural networks canexhibit a remarkably small difference between training and test performance.Conventional wisdom attributes small generalization error either to propertiesof the model family, or to the regularization techniques used during training. Through extensive systematic experiments, we show how these traditionalapproaches fail to explain why large neural networks generalize well inpractice. Specifically, our experiments establish that state-of-the-artconvolutional networks for image classification trained with stochasticgradient methods easily fit a random labeling of the training data. Thisphenomenon is qualitatively unaffected by explicit regularization, and occurseven if we replace the true images by completely unstructured random noise. Wecorroborate these experimental findings with a theoretical construction showingthat simple depth two neural networks already have perfect finite sampleexpressivity as soon as the number of parameters exceeds the number of datapoints as it usually does in practice. We interpret our experimental findings by comparison with traditional models.
Further reading
- Access Paper in arXiv.org