Skip to main content

Makale: Learning Semantic Relationships for Better Action Retrieval in Images

Human  actions  capture  a  wide  variety  of  interactions between people and objects.  As a result, the set of possible  actions  is  extremely  large  and  it  is  difficult  to  obtain sufficient  training  examples  for  all  actions.   However,  we could compensate for this sparsity in supervision by leveraging the rich semantic relationship between different actions.   A single action is often composed of other smaller actions and is exclusive of certain others. We need a method which can reason about such relationships and extrapolate unobserved  actions  from  known  actions.   Hence,  we  propose a novel neural network framework which jointly extracts the relationship between actions and uses them for training better action retrieval models. Our model incorporates linguistic, visual and logical consistency based cues to effectively identify these relationships.  We train and test
our model on a largescale image dataset of human actions. We show a significant improvement in mean AP compared to different baseline methods including the HEX-graph approach from Deng et al.