ICME 2003
Tutorial A1: Multimedia
Semantics and Machine Learning
Instructors
Milind R. Naphade; IBM T.J. Watson
Research Center
Handouts ·
IBM Part [Contact IBM] ·
UCSB Part [pdf] Time
& Location
Sunday,
6 July 2003, 9:00 - 12:30, Location: Dover A Abstract
Statistical
Learning is a well-established scientific discipline. The theories underlying
classical Statistical Learning are based on the assumption that the number of
training instances N is significantly larger than the dimensionality of the
data space D. In many emerging data-analysis applications such as multimedia
semantic content analysis we face the D > N high dimensionality challenge.
Even in cases where D < N, if the number of training instances is not
sufficiently large, conducting robust statistical inference is difficult. For
instance, an image/video search engine needs to learn users' query concepts
at a semantic level with a very small number of training instances (provided
by users via relevance feedback). At the same time, multimedia content
description typically involves numerous multi-modal features such as color,
texture, shape, motion, audio energy, and so on. The first half of this
tutorial introduces statistical methods that can conduct effective inference
in very high-dimensional spaces under the challenge of scarcity of training
data. The
second half of this tutorial focuses on the applications of the statistical
methods in multimedia content analysis. Effective semantic access is
essential for efficient utilization of the huge multimodal repositories that
are being generated rapidly. Standards like MPEG-7 have catalytically
channeled the directions of research from syntax to semantics and from
content to meta-data. We show how the problem of semantic analysis can be
analyzed using the framework of statistical modeling. We show how statistical
methods can be applied effectively to assist effective analysis and
extraction of semantics, where the semantics manifests itself in terms of
contents, context and structure. Presenter
Information
Prof.
Edward Chang received his Ph.D. in
Electrical Engineering at Stanford University in 1999. He is an Associate
Professor of Electrical and Computer Engineering at the University of
California, Santa Barbara. His research interests include statistical
learning and multimedia databases. He is a recipient of the IBM Faculty
Partnership Award from 2000 to 2002, and the NSF Career Award in 2002. Prof.
Chang is a co-founder and the CTO of VIMA Technologies. Milind
Ramesh Naphade received his B.E. degree
in Instrumentation and Control Engineering from the University of Pune, India
in July 1995, ranking first among the university students in his discipline.
He received his M.S. and Ph.D. degrees in Electrical Engineering from the
University of Illinois at Urbana-Champaign in 1998 and 2001 respectively. He
was a Computational Sciences and Engineering Fellow and a member of the Image
Formation and Processing group at the Beckman Institute for Advanced Science
and Technology. In 2001 he joined the Pervasive Media Management Group at the
IBM T. J. Watson Research Center in Hawthorne, NY, as a research staff
member. He has worked with the Kodak Research Laboratories of the Eastman
Kodak Company in the summer of 1997 and with the Microcomputer Research
Laboratories at Intel Corporation in the summer of 1998. With more than 40
publications he is among the earliest proponents of statistical modeling of
semantics of multimedia content and context. He is one of the lead architects
of the IBM TREC Video System for Concept Detection that registered the
highest Mean Average Precision in the NIST TREC Video 2002 Benchmark and the
highest known item recall in the TREC Video 2001 Benchmark. His research
interests include audio-visual signal processing and analysis for the purpose
of multimedia understanding, content-based indexing, retrieval and mining. He
is interested in applying advanced probabilistic pattern recognition and
machine learning techniques to model semantics in multimedia data. He is a
member of IEEE. John
R. Smith is Manager of the Pervasive
Media Management Group at IBM T. J. Watson Research Center, where he leads a
research team exploring techniques for multimedia content management. He is
currently Chair of the MPEG Multimedia Description Schemes (MDS) group and
serves as co-Project Editor for MPEG-7 Multimedia Description Schemes. Dr.
Smith received his M. Phil and PhD. degrees in Electrical Engineering from
Columbia University in 1994 and 1997, respectively. His research interests
include multimedia databases, multimedia content analysis, compression,
indexing, and retrieval. He is an Adjunct Professor at Columbia University
and a member of IEEE. |