Classifying Objectionable Websites Based on Image Content
James Ze Wang, Jia Li, Gio Wiederhold, Oscar Firschein
Stanford University, Stanford, CA 94305
Abstract:
This paper describes IBCOW (Image-based Classification of
Objectionable Websites), a system capable of classifying a website as
objectionable or benign based on image content. The system uses
WIPE (Wavelet Image Pornography Elimination) and
statistics to provide robust classification of on-line objectionable
World Wide Web sites. Semantically-meaningful feature vector matching
is carried out so that comparisons between a given on-line image and
images marked as "objectionable" and "benign" in a training set can be
performed efficiently and effectively in the WIPE module. If more
than a certain number of images sampled from a site is found to be
objectionable, then the site is considered to be objectionable. The
statistical analysis for determining the size of the image sample and
the threshold number of objectionable images is given in this paper.
The system is practical for real-world applications, classifying a Web
site at a speed of less than 2 minutes each, including the time to
compute the feature vector for the images downloaded from the site, on
a Pentium Pro PC. Besides its exceptional speed, it has demonstrated
higher than 97% sensitivity and 97% specificity in classifying a Web
site based solely on images. Both the sensitivity and the specificity
in real-world applications is expected to be higher because our
performance evaluation is relatively conservative and surrounding text
can be used to assist the classification process.
Full Paper in Color
(PDF, 1.2MB)
Full Paper in Color
(PostScript, 1.2MB)
Citation:
James Z. Wang, Jia Li, Gio Wiederhold and Oscar Firschein,
``Classifying Objectionable Websites Based on Image Content,'' Lecture
Notes in Computer Science, Special issue on interactive distributed
multimedia systems and telecommunication services, Oslo, Norway,
Thomas Plagemann and Vera Goebel (eds.), vol. 1483, pp. 113-124,
Springer-Verlag, September 1998.
Copyright 1998 Springer-Verlag. Personal use of this material is
permitted. However, permission to reprint/republish this material for
advertising or promotional purposes or for creating new collective
works for resale or redistribution to servers or lists, or to reuse
any copyrighted component of this work in other works, must be
obtained from the Springer-Verlag.
Last Modified:
01-Mar-98 22:10:45 PST
© 1998, James Z. Wang