Version vom 7. Juni 2018, 15:31 Uhr

Discovering Connotations as Labels for Weakly Supervised Image-Sentence Data

Aditya Mogadala, Bhargav Kanuparthi, Achim Rettinger, York Sure-Vetter

Published: 2018 April

Buchtitel: WWW'18: Proceedings of The Web Conference 2018, Lyon, France, April 2018
Seiten: 379-386
Verlag: ACM

Referierte Veröffentlichung

BibTeX

Kurzfassung
Growth of multimodal content on the web and social media has generated abundant weakly aligned image-sentence pairs. However, it is hard to interpret them directly due to intrinsic “intension”. In this paper, we aim to annotate such image-sentence pairs with connotations as labels to capture the intrinsic “intension”. We achieve it with a connotation multimodal embedding model (CMEM) using a novel loss function. It’s unique characteristics over previous models include: (i) the exploitation of multimodal data as opposed to only visual information, (ii) robustness to outlier labels in a multi-label scenario and (iii) works effectively with large-scale weakly supervised data. With extensive quantitative evaluation, we exhibit the effectiveness of CMEM for detection of multiple labels over other state-of-the-art approaches. Also, we show that in addition to annotation of image-sentence pairs with connotation labels, byproduct of our model inherently supports cross-modal retrieval i.e. image query - sentence retrieval.

ISBN: 978-1-4503-5640-4
Download: Media:Ctp147-mogadalaA.pdf
Weitere Informationen unter: Link
DOI Link: 10.1145/3184558.3186352

Forschungsgruppe

Web Science

Forschungsgebiet

Information Retrieval, Maschinelles Lernen, Künstliche Intelligenz, WWW Systeme

@@ Zeile 16: / Zeile 16: @@
 }}
 {{Inproceedings
-|Referiert=False
+|Referiert=True
+|BibTex-ID=www2018
 |Title=Discovering Connotations as Labels for Weakly Supervised Image-Sentence Data
 |Year=2018
 |Month=April
-|Booktitle=The Web Conference (Cognitive Computing Track)
+|Booktitle=WWW'18: Proceedings of The Web Conference 2018, Lyon, France, April 2018
+|Pages=379-386
 |Publisher=ACM
 }}
@@ Zeile 26: / Zeile 28: @@
 |Abstract=Growth of multimodal content on the web and social media has
 generated abundant weakly aligned image-sentence pairs. However, it is hard to interpret them directly due to intrinsic “intension”. In this paper, we aim to annotate such image-sentence pairs with connotations as labels to capture the intrinsic “intension”. We achieve it with a connotation multimodal embedding model (CMEM) using a novel loss function. It’s unique characteristics over previous models include: (i) the exploitation of multimodal data as opposed to only visual information, (ii) robustness to outlier labels in a multi-label scenario and (iii) works effectively with large-scale weakly supervised data. With extensive quantitative evaluation, we exhibit the effectiveness of CMEM for detection of multiple labels over other state-of-the-art approaches. Also, we show that in addition to annotation of image-sentence pairs with connotation labels, byproduct of our model inherently supports cross-modal retrieval i.e. image query - sentence retrieval.
+|ISBN=978-1-4503-5640-4
 |Download=Ctp147-mogadalaA.pdf,
+|Link=https://dl.acm.org/citation.cfm?id=3184558.3186352
+|DOI Name=10.1145/3184558.3186352
 |Forschungsgruppe=Web Science
 }}

Inproceedings3598: Unterschied zwischen den Versionen

Version vom 7. Juni 2018, 15:31 Uhr

Discovering Connotations as Labels for Weakly Supervised Image-Sentence Data

Discovering Connotations as Labels for Weakly Supervised Image-Sentence Data