Skip to main content

Makale: Deep Visual-Semantic Alignments for Generating Image Descriptions

We present a model that generates free-form natural language descriptions of full images and their regions. For generating sentences about a given image region we describe a Multimodal Recurrent Neural Network architecture. For inferring the latent alignments between segments of sentences and regions of images we describe a model based on a novel combination of Convolutional Neural Networks over image regions, bidirectional Recurrent Neural Networks over sentences, and a structured objective that aligns the two modalities through a multimodal embedding. This work was also featured in a New York Times article.

Ferhat Kurt

NVIDIA Deep Learning Institute Sertifikalı eğitmen.

Bir Cevap Yazın

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d blogcu bunu beğendi: