Joint Image and Text Representation for Aesthetics Analysis
Ye Zhou
Fudan University, China
Xin Lu
Adobe Systems Inc., USA
Junping Zhang
Fudan University, China
James Z. Wang
The Pennsylvania State University, USA
Abstract:
Image aesthetics assessment is essential to multimedia applications such as image retrieval, and personalized image search and recommendation. Primarily relying on visual information and manually-supplied ratings, previous studies in this area have not adequately utilized higher-level semantic information. We incorporate additional textual phrases from user comments to jointly represent image aesthetics utilizing multimodal Deep Boltzmann Machine. Given an image, without requiring any associated user comments, the proposed algorithm automatically infers the joint representation and predicts the aesthetics category of the image. We construct the AVA-Comments dataset to systematically evaluate the performance of the proposed algorithm. Experimental results indicate that the proposed joint representation improves the performance of aesthetics assessment on the benchmarking AVA dataset, comparing with only visual features.
Full color PDF file (0.6 MB)
Citation:
Ye Zhou, Xin Lu, Junping Zhang and James Z. Wang, ``Joint Image and
Text Representation for Aesthetics Analysis,'' Proceedings of the ACM
Multimedia Conference, pp. 262-266, Amsterdam, The Netherlands, ACM,
October 2016
Copyright 2016 ACM. Personal use of this material is
permitted. However, permission to reprint/republish this material for
advertising or promotional purposes or for creating new collective
works for resale or redistribution to servers or lists, or to reuse
any copyrighted component of this work in other works, must be
obtained from ACM.
Last Modified:
July 28, 2016
© 2016