Humans refer to objects in their environments all the time, especially in dialogue with other people. For example, one often uses referring expressions in everyday speech to indicate a particular person or object to a co-observer, e.g., “the man in the red hat” or “the book on the table”. In this project, we explore generating and comprehending natural language referring expressions for objects in images.


Use the Refer API to load 4 datasets: RefCLEF, RefCOCO, RefCOCO+, RefCOCOg.
Based on the above datasets, there are two tasks proposed:
MAttNet: Modular Attention Network for Referring Expression Comprehension
CVPR 2018
Licheng Yu, Zhe Lin, Xiaohui Shen, Jimei Yang, Xin Lu, Mohit Bansal, Tamara L. Berg
[Paper] [Demo] [Code]
A Joint Speaker-Listener-Reinforcer Model for Referring Expressions
CVPR 2017
Licheng Yu, Hao Tan, Mohit Bansal, Tamara L. Berg
[Paper] [Code] [Project Page] [Talk] (Spotlight presentation 8%)
Modeling Context in Referring Expressions
ECCV 2016
Licheng Yu, Patrick Poirson, Shan Yang, Alexander C. Berg, Tamara L. Berg
[Paper] [Dataset] [Talk] (Spotlight presentation 4.7%)
ReferItGame: Referring to Objects in Photographs of Natural Scenes
EMNLP 2014
Sahar Kazemzadeh, Vicente Ordonez, Mark Matten, Tamara L. Berg
[Paper] [Dataset] [Project Page] (Oral)