Stage-oe-small.jpg

Inproceedings3598: Unterschied zwischen den Versionen

Aus Aifbportal
Wechseln zu:Navigation, Suche
 
(3 dazwischenliegende Versionen von 2 Benutzern werden nicht angezeigt)
Zeile 16: Zeile 16:
 
}}
 
}}
 
{{Inproceedings
 
{{Inproceedings
|Referiert=False
+
|Referiert=True
 +
|BibTex-ID=www2018
 
|Title=Discovering Connotations as Labels for Weakly Supervised Image-Sentence Data
 
|Title=Discovering Connotations as Labels for Weakly Supervised Image-Sentence Data
 
|Year=2018
 
|Year=2018
 
|Month=April
 
|Month=April
|Booktitle=The Web Conference (Cognitive Computing Track)
+
|Booktitle=WWW'18: Proceedings of The Web Conference 2018, Lyon, France, April 2018
 +
|Pages=379-386
 
|Publisher=ACM
 
|Publisher=ACM
 
}}
 
}}
 
{{Publikation Details
 
{{Publikation Details
|Abstract=We address the task of labeling image-sentence pair at large-scale
+
|Abstract=Growth of multimodal content on the web and social media has
with varied concepts representing connotations. That is for any
+
generated abundant weakly aligned image-sentence pairs. However, it is hard to interpret them directly due to intrinsic “intension”. In this paper, we aim to annotate such image-sentence pairs with connotations as labels to capture the intrinsic “intension”. We achieve it with a connotation multimodal embedding model (CMEM) using a novel loss function. It’s unique characteristics over previous models include: (i) the exploitation of multimodal data as opposed to only visual information, (ii) robustness to outlier labels in a multi-label scenario and (iii) works effectively with large-scale weakly supervised data. With extensive quantitative evaluation, we exhibit the effectiveness of CMEM for detection of multiple labels over other state-of-the-art approaches. Also, we show that in addition to annotation of image-sentence pairs with connotation labels, byproduct of our model inherently supports cross-modal retrieval i.e. image query - sentence retrieval.
given query image-sentence, we aim to annotate them with the
+
|ISBN=978-1-4503-5640-4
connotations that capture intrinsic intension. To achieve it, we pro-
+
|Link=https://dl.acm.org/citation.cfm?id=3184558.3186352
pose a Connotation multimodal embedding model (CMEM) with a
+
|DOI Name=10.1145/3184558.3186352
novel loss function. Its unique characteristics over previous models
 
include (i) can leverage multimodal data as opposed to only visual
 
information, (ii) robust to outlier labels in a multi-label scenario
 
and (iii) works well with large-scale weakly supervised data. With
 
extensive quantitative evaluation, we exhibit the effectiveness of
 
CMEM for detection of multiple labels over other state-of-the-art
 
approaches. Also, we show that in addition to annotation of images
 
with connotation labels, our byproduct of the model inherently
 
supports cross-modal retrieval.
 
|Download=Ctp147-mogadalaA.pdf,
 
 
|Forschungsgruppe=Web Science
 
|Forschungsgruppe=Web Science
 
}}
 
}}
Zeile 49: Zeile 41:
 
{{Forschungsgebiet Auswahl
 
{{Forschungsgebiet Auswahl
 
|Forschungsgebiet=WWW Systeme
 
|Forschungsgebiet=WWW Systeme
 +
}}
 +
{{Forschungsgebiet Auswahl
 +
|Forschungsgebiet=Künstliche Intelligenz
 
}}
 
}}

Aktuelle Version vom 28. August 2018, 13:45 Uhr


Discovering Connotations as Labels for Weakly Supervised Image-Sentence Data


Discovering Connotations as Labels for Weakly Supervised Image-Sentence Data



Published: 2018 April

Buchtitel: WWW'18: Proceedings of The Web Conference 2018, Lyon, France, April 2018
Seiten: 379-386
Verlag: ACM

Referierte Veröffentlichung

BibTeX

Kurzfassung
Growth of multimodal content on the web and social media has generated abundant weakly aligned image-sentence pairs. However, it is hard to interpret them directly due to intrinsic “intension”. In this paper, we aim to annotate such image-sentence pairs with connotations as labels to capture the intrinsic “intension”. We achieve it with a connotation multimodal embedding model (CMEM) using a novel loss function. It’s unique characteristics over previous models include: (i) the exploitation of multimodal data as opposed to only visual information, (ii) robustness to outlier labels in a multi-label scenario and (iii) works effectively with large-scale weakly supervised data. With extensive quantitative evaluation, we exhibit the effectiveness of CMEM for detection of multiple labels over other state-of-the-art approaches. Also, we show that in addition to annotation of image-sentence pairs with connotation labels, byproduct of our model inherently supports cross-modal retrieval i.e. image query - sentence retrieval.

ISBN: 978-1-4503-5640-4
Weitere Informationen unter: Link
DOI Link: 10.1145/3184558.3186352



Forschungsgruppe

Web Science


Forschungsgebiet

Information Retrieval, Maschinelles Lernen, Künstliche Intelligenz, WWW Systeme