Testimony Prepared for the COPA (Child Online Protection Act ) Commission

Gio Wiederhold

Professor
Computer Science Dept. and Dept.of Medicine, Stanford University
www-db.stanford.edu/people/gio.html

Technology demo by

James Wang (PhD summer 2000)

As of fall 2000 at Pennsylvania State University
http://jxw.stanford.edu/cgi-bin/zwang/wipe2_show.cgi


August 2000

Thank you for the privilege of presenting some statements concerning the technology relevant to the protection of minors before this commission. I will not attempt to present the full range of technological choices, threats, and candidate solutions. Instead I will focus on two points, one being specific technology in image recognition (WIPEä ) that we have developed at Stanford, and, second, some systematic setting of the issues in a producer-transmitter-consumer framework, which will illustrate requirements and barriers to technological solutions. I will actually start with the second aspect, but close with some suggestions for managing the system issues.

Dealing with the system chain

To deal effectively with any problem we have to consider complete systems and the feedback loops that enable their functioning. Applying corrections or constraints at isolated points will just cause systems that are in some sense stable, to adapt. We have to consider here the entire chain of flow from Producers via Transmitters to Consumers. In each of these categories exist a variety of groups, with different means and objectives. We often hear complaints that some technology will not solve the whole problem, but such an observation, even though nearly always true, does not mean that it cannot be applied to affect the system in some focused and desirable way. However, its effect on the overall system must always be assessed.

The producers of objectionable material can be, and are likely to be anywhere; not only where US law applies. However, physical predators are likely to be within reach of the laws being considered here.

While many producers of pornography are financially motivated, and will respond to certain measures, there is also a significant remainder for whom it is a hobby or a disease. Their costs in setting up a pornographic website are small. The computer technology is cheap, and poverty and exhibitionism provides low-cost content material. Substantial cost is incurred in marketing and collecting income, but for the hobby sites marketing is not a concern. If commercial explicit websites are closed, the hobby sites will gain popularity as long as transmission and consumers remain available.

Transmission capabilities via the Internet of attractive images and videos of all kinds are improving. Most commercial producers and consumer sites in industry and higher education have broadband access (ISDN, DSL, cable, T1, …). Today, halfway into the year 2000, it is estimated that already 1.4M homes have broadband access as well. By 2004 more than 20M homes will have such capabilities [Yankee group per Internet World, August 1, 2000]. The spread of broadband transmission capabilities to the home is, among other, motivated by the entertainment industry and is unlikely to be stopped.

Distance has become a minor factor in transmission costs, so that material can move from any where to anywhere at nearly equal cost, and intermediate nodes can easily be on countries where U.S. law does not apply, even when producers and consumers are relatively local. Costs incurred for transmission are further being reduced by caching voluminous material at the periphery of the Internet. Such technology has been initiated by Akemai and is used by major distributors of video and images such as CNN. There is no reason why objectionable material cannot be made available similarly.

The density and complexity of the Internet backbone connections, as well as the many ISP shortcuts is such that constraints in transmission paths that are seen as onerous can be bypassed.

Consumers the systems of our concern are the children, and the parents and the libraries who are paying for Internet access and their equipment. As Commissioner Telage has observed, these communities are not willing today to pay for and install restriction tools, as Netnanny and Cyberpatrol. Libraries are reasonably concerned about not limiting freedom of speech and its access. Parents are not motivated to impose restrictions on their children, especially when they are teenagers, because of concerns for trust and awkwardness in dealing with the issues. There may also be an unwillingness by some adults to deny access to explicit material to themselves. Furthermore, as also pointed out during these hearings, many of the

tools available today are naïve and inadequate.

We have found indeed that the businesses that are marketing restriction tools do not have the income to substantially improve their products. Lack of quality feeds the cycle that inhibits their sales in the first place. It is unlikely that any of them can deal in isolation adequately with the problem. The growth of the Internet, the agility of the producers, and the complexity of dealing with all the paths and the diversity of consumer equipment makes their task harder than they are willing to admit. My final suggestions will address this issue.

The consumers are a critical link. If they do not cooperate, the chain will not be broken, as seen in

Tools for consumers must be made desirable, simple, reliable, and adaptable. And they must be free or very cheap, since there is the expectation that everything on the Internet is free -- once you bought your computer. This model that continues the tradition of free radio and TV reception in the United States, all supported by advertising.

Our technology

As a byproduct of combining research in privacy protection and image recognition, supported by NSF HPCC and Digital Library initiatives, James Wang and I have developed and demonstrated a fast wavelet-based image recognition technology that can identify objectionable images with a higher reliability than other, less mathematical methods. This technology, Wavelet Image Pornography Elimination or WIPEÔ is available for licensing through Stanford. The distinction made by WIPE are learned automatically, by feeding into WIPE the contents of known pornographic sites versus other image libraries. WIPE can now identify objectionable pictures with 97% correct hits, while misclassifying 7% of non-objectionable images. The actual values, of course, depend on ratios of images presented, we must note that the web overall contains many more desirable images than objectionable ones. We will display a few examples of failures, since they provide a better insight into the WIPE technology than showing a thousand successes. The system is available for testing on-line, at the URL given in the heading, where you can submit your own pictures for evaluation [Wang, James Ze, Jia Li, Gio Wiederhold, and Oscar Firschein: "System for Screening Objectionable Images"; Computer Communications Journal, Vol.21 no.15, pages 1355-1360, Elsevier Science, 1998].

Since pornographic web sites contain many objectionable images WIPE, can be used to classify sites with very high confidence based on image contents alone. A practical system would combine WIPE technology with text-based analysis and good-site lists, so that, say, a museum would not be classified as an objectionable site [Wang, James Ze, Jia Li, Gio Wiederhold, Oscar Firschein: "System for Classifying Objectionable Websites"; Proceedings of the 5th International Workshop on Interactive Distributed Multimedia Systems and Telecommunication Services (IDMS'98), Thomas Plagemann and Vera Goebel (Eds.), Oslo, Norway, 113-124, Springer-Verlag LNCS 1483, September 1998].

We have used our technology also to extract words hidden in images and graphics. That approach has been applied to removing patient-identifiers from X-ray films to be released to research. We are interested in applying it also to extract terms hidden in images and banners, so that further filtering can take place.

We did learn from trying to transfer our technology into commercial tools that the current vendors of filtering software did not have sufficient income to consider augmenting their products.

 

A two-element suggestion

It appears clear that the web is too dynamic and large a place for simple solutions to the problems, that COPA is addressing to be easily implemented, marketed and maintained. No single approach is or can be perfect. We propose a two element approach:

1. Define greenfield and redlight areas on the web, for the small volume of material certified to be definitely good for kids and inappropriate for kids.

2. Establish an industry consortium to monitor the remaining, gray, area of the web and publish for public use the site identifications for sites that should be in the redlight category.

Various presenters have advocated a classification of the providers. That is certainly one desirable step. Two choices were given as alternatives:

  1. Greenfields, namely web sites certified to be suitable for kids, perhaps tagged with a domain suffix as green.kid. Everything not green could be forbidden, or closely monitored by parents.
  2. Redlightareas, namely web sites known to contain prurient material, perhaps tagged with a domain suffix as red.xxx. Everything not red would be freely accessible to kids, unless additional constraints were imposed by parents.

Unfortunately, this simple division is naïve. The majority of material on the web, namely business, entertainment, and scientific sites, would not want to consider themselves red, green, nor explicitly non-red or non-green. The volume, furthermore, of all web material is so much that no greenfield could guarantee that it covered all green material, with the converse holding for the redlightareas. Most contributors to the web have no interest, nor the expertise to classify themselves in that manner. This means that greenfields and redlightareas can both exist, with the labeling managed by selected authorities, but that a very large gray area will remain, containing entertainment, scientific, and business-oriented material.

We have heard already from several organizations maintaining greenfields subscribing to various criteria. Their criteria differ, but are all appropriate for the greenfield domain. Being together, they should gain more recognition and acceptance, and a green.kid label may well help. They should make clear, however, to the parents, that they can only include a small portion of the web, and that parent-guided forays out of the greenfield will be important for the kids education.

Dealing with redlight areas is a level more complex. We can assume that most commercial purveyors of objectionable material would join a red.xxx domain, unless it seriously inhibited their business. Employers may choose to restrict employees. I have no data about what percentage of commercial redlight type businesses derives from customers at work. Some public places may restrict access to redlight sites as well, and here the issues impinge on freedom of speech and association, but having a redlight.xxx domain would only clarify the discussion. Transmitters and ISPs may not want to restrict transmission of redlight material, since it is likely to represent a substantial customer base.

But we also have to deal with redlight sites that are not motiviated to adopt the .xxx domain name, as hobby-based sites and the like. Foreign sites may also not comply.

With some effort likely redlight sites outside of the xxx domain could be identified largely automatically, by combining existing text based technology and novel techniques such as WIPE. However, human filtering is appropriate before such sites are placed on, let's call it the red list. Such a list should be public, and appeals should be possible. Keeping such a red-list of objectionable non-xxx domain sites up-to-date exceeds today the personnel and financial capabilities of the vendors of filtering products. Here a establishing a consortium, jointly funded by the producers of filtering software and major Internet portals, seems to be a solution. The only objective of the consortium would be to set up to publish such an annotated red-list, not to make rules on how such a list is to be used. The consortium would run the spiders, crawlers, and analyzers, and finally provide some human monitoring prior to moving sites into the list. It would never need to crawl the red.xxx nor the green.kid domain sites. As a consortium it should be exempt from anti-trust concerns, and also from any particular ideology. Provision for appealing mistakes will need to exist, even integrated technology supervised by well-meaning individuals will not be perfect. There might be legal means to encourage redlight sites to join the xxx domain classification, but again that is an issue that I cannot address. .

Companies providing filtering software could now adopt or select from that published list. Typical choices made available to the purchasers of filtering software would be

  1. Allow my children only access to the greenfield, plus other sites I designate
  2. Allow my children access to all sites but redlight and red-listed sites, except for sites I designate.
  3. Allow my children access to all sites but redlight sites, except for sites I designate.

Employers, libraries, schools may well have different rules.

The suppliers of filtering software would not compete on content, but on ease of use, installation, parental adaptability, and the like.

In closing

No single solution will be adequate, but a combination of technologies inserted at a public place can do much to provide the information needed to protect children from undesirable influences.

However the technologies have to be applied within a well-understood producer-transmitter-consumer model to be effective and economic for all participants, and avoid easy workarounds.

Sharing the information about inappropriate sites, and making the information public gives people the choice to filter what they deem appropriate. A consortium can provide such a surface, both to the public and to software producers.

There is a cost to breaking the chain through which undesirable influences are exerted, and there must be willingness by parents and other participants to bear that cost, both financially and socially.

---------------------------------------