ICME 2003

Tutorial A1: Multimedia Semantics and Machine Learning


Milind R. Naphade; IBM T.J. Watson Research Center

Ed Chang; University of California, Santa Barbara

John R. Smith; IBM T.J. Watson Research Center



        IBM Part [Contact IBM]

        UCSB Part [pdf]

Time & Location

Sunday, 6 July 2003, 9:00 - 12:30, Location: Dover A


Statistical Learning is a well-established scientific discipline. The theories underlying classical Statistical Learning are based on the assumption that the number of training instances N is significantly larger than the dimensionality of the data space D. In many emerging data-analysis applications such as multimedia semantic content analysis we face the D > N high dimensionality challenge. Even in cases where D < N, if the number of training instances is not sufficiently large, conducting robust statistical inference is difficult. For instance, an image/video search engine needs to learn users' query concepts at a semantic level with a very small number of training instances (provided by users via relevance feedback). At the same time, multimedia content description typically involves numerous multi-modal features such as color, texture, shape, motion, audio energy, and so on. The first half of this tutorial introduces statistical methods that can conduct effective inference in very high-dimensional spaces under the challenge of scarcity of training data.

The second half of this tutorial focuses on the applications of the statistical methods in multimedia content analysis. Effective semantic access is essential for efficient utilization of the huge multimodal repositories that are being generated rapidly. Standards like MPEG-7 have catalytically channeled the directions of research from syntax to semantics and from content to meta-data. We show how the problem of semantic analysis can be analyzed using the framework of statistical modeling. We show how statistical methods can be applied effectively to assist effective analysis and extraction of semantics, where the semantics manifests itself in terms of contents, context and structure.

Presenter Information

Prof. Edward Chang received his Ph.D. in Electrical Engineering at Stanford University in 1999. He is an Associate Professor of Electrical and Computer Engineering at the University of California, Santa Barbara. His research interests include statistical learning and multimedia databases. He is a recipient of the IBM Faculty Partnership Award from 2000 to 2002, and the NSF Career Award in 2002. Prof. Chang is a co-founder and the CTO of VIMA Technologies.

Milind Ramesh Naphade received his B.E. degree in Instrumentation and Control Engineering from the University of Pune, India in July 1995, ranking first among the university students in his discipline. He received his M.S. and Ph.D. degrees in Electrical Engineering from the University of Illinois at Urbana-Champaign in 1998 and 2001 respectively. He was a Computational Sciences and Engineering Fellow and a member of the Image Formation and Processing group at the Beckman Institute for Advanced Science and Technology. In 2001 he joined the Pervasive Media Management Group at the IBM T. J. Watson Research Center in Hawthorne, NY, as a research staff member. He has worked with the Kodak Research Laboratories of the Eastman Kodak Company in the summer of 1997 and with the Microcomputer Research Laboratories at Intel Corporation in the summer of 1998. With more than 40 publications he is among the earliest proponents of statistical modeling of semantics of multimedia content and context. He is one of the lead architects of the IBM TREC Video System for Concept Detection that registered the highest Mean Average Precision in the NIST TREC Video 2002 Benchmark and the highest known item recall in the TREC Video 2001 Benchmark. His research interests include audio-visual signal processing and analysis for the purpose of multimedia understanding, content-based indexing, retrieval and mining. He is interested in applying advanced probabilistic pattern recognition and machine learning techniques to model semantics in multimedia data. He is a member of IEEE.

John R. Smith is Manager of the Pervasive Media Management Group at IBM T. J. Watson Research Center, where he leads a research team exploring techniques for multimedia content management. He is currently Chair of the MPEG Multimedia Description Schemes (MDS) group and serves as co-Project Editor for MPEG-7 Multimedia Description Schemes. Dr. Smith received his M. Phil and PhD. degrees in Electrical Engineering from Columbia University in 1994 and 1997, respectively. His research interests include multimedia databases, multimedia content analysis, compression, indexing, and retrieval. He is an Adjunct Professor at Columbia University and a member of IEEE.