Skip to main content

Makale: Dropout: A Simple Way to Prevent Neural Networks from Overfitting

Deep neural nets with a large number of parameters are very powerful machine learning systems. However, overfitting is a serious problem in such networks. Large networks are also slow to use, making it difficult to deal with overfitting by combining the predictions of many different large neural nets at test time. Dropout is a technique for addressing this problem. The key idea is to randomly drop units (along with their connections) from the neural network during training. This prevents units from co-adapting too much. During training, dropout samples from an exponential number of different “thinned” networks. At test time, it is easy to approximate the effect of averaging the predictions of all these thinned networks by simply using a single unthinned network that has smaller weights. This significantly reduces overfitting and gives major improvements over other regularization methods. We show that dropout improves the performance of neural networks on supervised learning tasks in vision, speech recognition, document classification and computational biology, obtaining state-of-the-art results on many benchmark data sets.

Makale: Distilling the Knowledge in a Neural Network

A very simple way to improve the performance of almost any machine learning algorithm is to train many different models on the same data and then to average their predictions. Unfortunately, making predictions using a whole ensemble of models is cumbersome and may be too computationally expensive to allow deployment to a large number of users, especially if the individual models are large neural nets. Caruana and his collaborators have shown that it is possible to compress the knowledge in an ensemble into a single model which is much easier to deploy and we develop this approach further using a different compression technique. We achieve some surprising results on MNIST and we show that we can significantly improve the acoustic model of a heavily used commercial system by distilling the knowledge in an ensemble of models into a single model. We also introduce a new type of ensemble composed of one or more full models and many specialist models which learn to distinguish fine-grained classes that the full models confuse. Unlike a mixture of experts, these specialist models can be trained rapidly and in parallel.

Dergi: Deep Learning

Deep learning allows computational models that are composed of multiple processing layers to learn representations of data with multiple levels of abstraction. These methods have dramatically improved the state-of-the-art in speech recognition, visual object recognition, object detection and many other domains such as drug discovery and genomics. Deep learning discovers intricate structure in large data sets by using the backpropagation algorithm to indicate how a machine should change its internal parameters that are used to compute the representation in each layer from the representation in the previous layer. Deep convolutional nets have brought about breakthroughs in processing images, video, speech and audio, whereas recurrent nets have shone light on sequential data such as text and speech.

Tez: Recursive Deep Learning for Natural Language Processing and Computer Vision


Info
Richard Socher
Ph.D. Thesis
2014
Stanford University

As the amount of unstructured text data that humanity produces overall and on the Internet grows, so does the need to intelligently process it and extract different types of knowledge from it. My research goal in this thesis is to develop learning models that can automatically induce representations of human language, in particular its structure and meaning in order to solve multiple higher level language tasks.
There has been great progress in delivering technologies in natural language processing such as extracting information, sentiment analysis or grammatical analysis. However, solutions are often based on different machine learning models. My goal is the development of general and scalable algorithms that can jointly solve such tasks and learn the necessary intermediate representations of the linguistic units involved. Furthermore, most standard approaches make strong simplifying language assumptions and require well designed feature representations. The models in this thesis address these two shortcomings. They provide effective and general representations for sentences without assuming word order independence. Furthermore, they provide state of the art performance with no, or few manually designed features.

Devamını Oku

Tez: Optimizing Neural Networks That Generate Images


Info
Tijmen Tieleman
Ph.D. Thesis
2014
University of Toronto

Image recognition, also known as computer vision, is one of the most prominent applications of neural networks. The image recognition methods presented in this thesis are based on the reverse process: generating images. Generating images is easier than recognizing them, for the computer systems that we have today. This work leverages the ability to generate images, for the purpose of recognizing other images.
One part of this thesis introduces a thorough implementation of this “analysis by synthesis” idea in a sophisticated autoencoder. Half of the image generation system (namely the structure of the system) is hard-coded; the other half (the content inside that structure) is learned. At the same time as this image generation system is being learned, an accompanying image recognition system is learning to extract descriptions from images. Learning together, these two components develop an excellent understanding of the provided data.
The second part of the thesis is an algorithm for training undirected generative models, by making use of a powerful interaction between training and a Markov Chain whose task is to produce samples from the model. This algorithm is shown to work well on image data, but is equally applicable to undirected generative models of other types of data.

Devamını Oku