Makaleler Arşivleri - Derin Öğrenme

EESEN: End-to-End Speech Recognition using Deep RNN Models and WFST-based Decoding

5 Mayıs 20165 Mayıs 2016 Ferhat Kurt Yorum yapın

The performance of automatic speech recognition (ASR) has improved tremendously due to the application of deep neural networks (DNNs). Despite this progress, building a new ASR system remains a challenging task, requiring various resources, multiple training stages and significant expertise. This paper presents our Eesen framework which drastically simplifies the existing pipeline to build state-of-the-art ASR systems. Acoustic modeling in Eesen involves learning a single recurrent neural network (RNN) predicting context-independent targets (phonemes or characters). To remove the need for pre-generated frame labels, we adopt the connectionist temporal classification (CTC) objective function to infer the alignments between speech and label sequences. A distinctive feature of Eesen is a generalized decoding approach based on weighted finite-state transducers (WFSTs), which enables the efficient incorporation of lexicons and language models into CTC decoding. Experiments show that compared with the standard hybrid DNN systems, Eesen achieves comparable word error rates (WERs), while at the same time speeding up decoding significantly.

1507.08240v3 EESEN End-to-End Speech Recognition using Deep RNN Models and WFST-based Decoding.pdf

Show and Tell: A Neural Image Caption Generator

25 Nisan 201625 Nisan 2016 Ferhat Kurt Yorum yapın

1411.4555v2Show and Tell: A Neural Image Caption Generator .pdf

https://gist.github.com/jcoreyes

http://cs.stanford.edu/people/karpathy/deepimagesent/

https://gist.github.com/jcoreyes/7e76e90664f935c6f65d

https://github.com/karpathy/neuraltalk

130 Derin Öğrenme Projesi

24 Mart 201624 Mart 2016 Ferhat Kurt Yorum yapın

Stanford Üniversitesi “CS231n: Convolutional Neural Networks for Visual Recognition” dersi kapsamında hazırlanan 200 çalışmadan 130’u erişime açıldı. Darısı bizim üniversitelerimizin başına. Çalışma raporlarına erişmek için: http://cs231n.stanford.edu/reports2016.html

Google TensorFlow Makine Zekası için Açık Kaynak Kodlu Kütüphanesini Yayınladı

5 Aralık 20155 Aralık 2015 Ferhat Kurt Yorum yapın

Google, bir süre önce derin öğrenme algoritmalarını hazırladığı ve makine zekası olarak adlandırdığı Google TensorFlow‘u açık kaynak kodlu olarak kullanıma sundu. Uygulamaya yönelik aşağıdaki makaleyi inceleyebilirsiniz.

Google TensorFlow whitepaper2015.pdf

Makale: VoxNet: A 3D Convolutional Neural Network for Real-Time Object Recognition

19 Kasım 201519 Kasım 2015 Ferhat Kurt Yorum yapın

Robust object recognition is a crucial skill for robots operating autonomously in real world environments. Range sensors such as LiDAR and RGBD cameras are increasingly found in modern robotic systems, providing a rich source of 3D information that can aid in this task. However, many current systems do not fully utilize this information and have trouble efficiently dealing with large amounts of point cloud data. In this paper, we propose VoxNet, an architecture to tackle this problem by integrating a volumetric Occupancy Grid representation with a supervised 3D Convolutional Neural Network (3D CNN). We evaluate our approach on publicly available benchmarks using LiDAR, RGBD, and CAD data.
VoxNet achieves accuracy beyond the state of the art while labeling hundreds of instances per second.