The goal of this thesis is to present a few small steps along the road to solving general artificial intelligence. This is a thesis by articles containing four articles. Each of these articles presents a new method for performing perceptual inference using machine learning and deep architectures. Each of these papers demonstrates the utility of the proposed method in the context of a computer vision task. The methods are more generally applicable and in some cases have been applied to other kinds of tasks, but this thesis does not explore such applications.
|University of Toronto|
Deep neural nets with a huge number of parameters are very powerful machine learning systems. However, overfitting is a serious problem in such networks. Large networks are also slow to use, making it difficult to deal with overfitting by combining many different large neural nets at test time. Dropout is a technique for addressing this problem. The key idea is to randomly drop units (along with their connections) from a neural network during training. This prevents the units from co-adapting too much. Dropping units creates thinned networks during training. The number of possible thinned networks is exponential in the number of units in the network. At test time all possible thinned networks are combined using an approximate model averaging procedure. Dropout training followed by this approximate model combination significantly reduces overfitting and gives major improvements over other regularization methods. In this work, we describe models that improve the performance of neural networks using dropout, often obtaining state-of-the-art results on benchmark datasets.
As the amount of unstructured text data that humanity produces overall and on the Internet grows, so does the need to intelligently process it and extract different types of knowledge from it. My research goal in this thesis is to develop learning models that can automatically induce representations of human language, in particular its structure and meaning in order to solve multiple higher level language tasks.
There has been great progress in delivering technologies in natural language processing such as extracting information, sentiment analysis or grammatical analysis. However, solutions are often based on different machine learning models. My goal is the development of general and scalable algorithms that can jointly solve such tasks and learn the necessary intermediate representations of the linguistic units involved. Furthermore, most standard approaches make strong simplifying language assumptions and require well designed feature representations. The models in this thesis address these two shortcomings. They provide effective and general representations for sentences without assuming word order independence. Furthermore, they provide state of the art performance with no, or few manually designed features.
|University of Toronto|
Image recognition, also known as computer vision, is one of the most prominent applications of neural networks. The image recognition methods presented in this thesis are based on the reverse process: generating images. Generating images is easier than recognizing them, for the computer systems that we have today. This work leverages the ability to generate images, for the purpose of recognizing other images.
One part of this thesis introduces a thorough implementation of this “analysis by synthesis” idea in a sophisticated autoencoder. Half of the image generation system (namely the structure of the system) is hard-coded; the other half (the content inside that structure) is learned. At the same time as this image generation system is being learned, an accompanying image recognition system is learning to extract descriptions from images. Learning together, these two components develop an excellent understanding of the provided data.
The second part of the thesis is an algorithm for training undirected generative models, by making use of a powerful interaction between training and a Markov Chain whose task is to produce samples from the model. This algorithm is shown to work well on image data, but is equally applicable to undirected generative models of other types of data.