Skip to main content

Tez: Deep Learning of Representations and its Application to Computer Vision

The goal of this thesis is to present a few small steps along the road to solving general artificial intelligence. This is a thesis by articles containing four articles. Each of these articles presents a new method for performing perceptual inference using machine learning and deep architectures. Each of these papers demonstrates the utility of the proposed method in the context of a computer vision task. The methods are more generally applicable and in some cases have been applied to other kinds of tasks, but this thesis does not explore such applications.

Dergi: Deep Learning

Deep learning allows computational models that are composed of multiple processing layers to learn representations of data with multiple levels of abstraction. These methods have dramatically improved the state-of-the-art in speech recognition, visual object recognition, object detection and many other domains such as drug discovery and genomics. Deep learning discovers intricate structure in large data sets by using the backpropagation algorithm to indicate how a machine should change its internal parameters that are used to compute the representation in each layer from the representation in the previous layer. Deep convolutional nets have brought about breakthroughs in processing images, video, speech and audio, whereas recurrent nets have shone light on sequential data such as text and speech.

NVIDIA DIGITS ve Derin Öğrenme ile İlgili Sorular ve Cevaplar

NVIDIA’nın DIGITS hakkında düzenlemiş olduğu online derste (12.08.2015) katılımcıların yazılı sorularına verilen cevaplar aşağıda yer almaktadır. Dersle ilgili daha fazla bilgi için tıklayınız.

Q: I own a Titan X. I read somewhere that its single-precision performance (FP32) is 7 TFLOPS and double-precision performance (FP64) is only 1.3 TFLOPS. Do the frameworks discussed here all use single-precision by default? If not, how can they be configured for best performance?
A: By default, all the frameworks use single precision floating point.

Q: How is the number of GPUS set in DIGITS?
A: The number of GPUs to use is set on the train model page

Q: Will the model we make on digits work on Nvidia’s fork of Caffe or will it work with vanilla caffe too?
A: It will work in the main branch of Caffe.  Nvidia’s fork uses the same formats and layer types.

Q: Can digits work on a cluster? I have two GPUs on different machines. If I create a cluster out of them, can digits utilise the two GPU’s?
A: Yes, DIGITS can utilize two GPUs.  Recall that DIGITS is built on top of 3rd party frameworks so provided those frameworks can use two GPUs, then DIGITS can also.

Q: I own a Titan X. I read somewhere that its single-precision performance (FP32) is 7 TFLOPS and double-precision performance (FP64) is only 1.3 TFLOPS. Do the frameworks discussed here all use single-precision by default? If not, how can they be configured for best performance?
A: It is single precision by default in Digits

Q: Is it possible to train voice datas with using NVIDIA DIGITS?
A: Currently DIGITS is designed for training on images, but we would like to add support for speech/voice in the future

Devamını Oku