Skip to main content

Tez: Improving Neural Networks with Dropout


Info
Nitish Srivastava
Master’s Thesis
2013
University of Toronto

Deep neural nets with a huge number of parameters are very powerful machine learning systems. However, overfitting is a serious problem in such networks. Large networks are also slow to use, making it difficult to deal with overfitting by combining many different large neural nets at test time. Dropout is a technique for addressing this problem. The key idea is to randomly drop units (along with their connections) from a neural network during training. This prevents the units from co-adapting too much. Dropping units creates thinned networks during training. The number of possible thinned networks is exponential in the number of units in the network. At test time all possible thinned networks are combined using an approximate model averaging procedure. Dropout training followed by this approximate model combination significantly reduces overfitting and gives major improvements over other regularization methods. In this work, we describe models that improve the performance of neural networks using dropout, often obtaining state-of-the-art results on benchmark datasets.

Devamını Oku

Tez: Optimizing Neural Networks That Generate Images


Info
Tijmen Tieleman
Ph.D. Thesis
2014
University of Toronto

Image recognition, also known as computer vision, is one of the most prominent applications of neural networks. The image recognition methods presented in this thesis are based on the reverse process: generating images. Generating images is easier than recognizing them, for the computer systems that we have today. This work leverages the ability to generate images, for the purpose of recognizing other images.
One part of this thesis introduces a thorough implementation of this “analysis by synthesis” idea in a sophisticated autoencoder. Half of the image generation system (namely the structure of the system) is hard-coded; the other half (the content inside that structure) is learned. At the same time as this image generation system is being learned, an accompanying image recognition system is learning to extract descriptions from images. Learning together, these two components develop an excellent understanding of the provided data.
The second part of the thesis is an algorithm for training undirected generative models, by making use of a powerful interaction between training and a Markov Chain whose task is to produce samples from the model. This algorithm is shown to work well on image data, but is equally applicable to undirected generative models of other types of data.

Devamını Oku

Tez: Exploring Deep Learning Methods for Discovering Features in Speech Signals


Info
Navdeep Jaitly
Ph.D. Thesis
2014
University of Toronto

This thesis makes three main contributions to the area of speech recognition with Deep Neural Network – Hidden Markov Models (DNN-HMMs).
Firstly, we explore the effectiveness of features learnt from speech databases using Deep Learning for speech recognition. This contrasts with prior works that have largely confined themselves to using traditional features such as Mel Cepstral Coefficients and Mel log filter banks for speech recognition. We start by showing that features learnt on raw signals using Gaussian-ReLU Restricted Boltzmann Machines can achieve accuracy close to that achieved with the best traditional features. These features are, however, learnt using a generative model that ignores domain knowledge. We develop methods to discover features that are endowed with meaningful semantics that are relevant to the domain using capsules. To this end, we extend previous work on transforming autoencoders and propose a new autoencoder with a domain-specific decoder to learn capsules from speech databases. We show that capsule instantiation parameters can be combined with Mel log filter banks to produce improvements in phone recognition on TIMIT. On WSJ the word error rate does not improve, even though we get strong gains in classification accuracy. We speculate this may be because of the mismatched objectives of word error rate over an utterance and frame error rate on the sub-phonetic class for a frame.
Secondly, we develop a method for data augmentation in speech datasets. Such methods result in strong gains in object recognition, but have largely been ignored in speech recognition. Our data augmentation encourages the learning of invariance to vocal tract length of speakers. The method is shown to improve the phone error rate on TIMIT and the word error rate on a 14 hour subset of WSJ.
Lastly, we develop a method for learning and using a longer range model of targets, conditioned on the input. This method predicts the labels for multiple frames together and uses a geometric average of these predictions during decoding. It produces state of the art results on phone recognition with TIMIT and also produces significant gains on WSJ.

Devamını Oku

Tez: Deep Learning Approaches to Problems in Speech Recognition, Computational Chemistry, and Natural Language Text Processing


Info
George E. Dahl
Ph.D. Thesis
2015
University of Toronto

The deep learning approach to machine learning emphasizes high-capacity, scalable models that learn distributed  representations  of  their  input. This  dissertation  demonstrates  the  ecacy  and  generality of this approach in a series of diverse case studies in speech recognition, computational chemistry, and natural language processing.  Throughout these studies, I extend and modify the neural network models as needed to be more e ective for each task.
In  the  area  of  speech  recognition,  I  develop  a  more  accurate  acoustic  model  using  a  deep  neural network.  This model, which uses recti ed linear units and dropout, improves word error rates on a 50 hour broadcast news task.  A similar neural network results in a model for molecular activity prediction substantially more e ective than production systems used in the pharmaceutical industry.  Even though training assays in drug discovery are not typically very large, it is still possible to train very large models by leveraging data from multiple assays in the same model and by using e ective regularization schemes. In the area of natural language processing, I first describe a new restricted Boltzmann machine training algorithm suitable for text data.  Then, I introduce a new neural network generative model of parsed sentences capable of generating reasonable samples and demonstrate a performance advantage for deeper variants of the model.

Devamını Oku