|George E. Dahl
|University of Toronto
The deep learning approach to machine learning emphasizes high-capacity, scalable models that learn distributed representations of their input. This dissertation demonstrates the ecacy and generality of this approach in a series of diverse case studies in speech recognition, computational chemistry, and natural language processing. Throughout these studies, I extend and modify the neural network models as needed to be more eective for each task.
In the area of speech recognition, I develop a more accurate acoustic model using a deep neural network. This model, which uses rectied linear units and dropout, improves word error rates on a 50 hour broadcast news task. A similar neural network results in a model for molecular activity prediction substantially more eective than production systems used in the pharmaceutical industry. Even though training assays in drug discovery are not typically very large, it is still possible to train very large models by leveraging data from multiple assays in the same model and by using eective regularization schemes. In the area of natural language processing, I first describe a new restricted Boltzmann machine training algorithm suitable for text data. Then, I introduce a new neural network generative model of parsed sentences capable of generating reasonable samples and demonstrate a performance advantage for deeper variants of the model.