CS 545I - Advanced Image and Video Databases, Winter 2001

Friday, 12 Jan 2000. First lecture CS545I

Introduction to Image and Video Databases

Oscar Firschein, Visiting Scholar, Computer Science, Stanford

This lecture will discuss the various types of image and video databases, and the needs of a user of such systems. Various ways of indexing such databases will be discussed, and why it is extremely difficult to develop an automated image analysis program that can describe an image at the level of sophistication of a person.

People can look at an image and instantly "understand" it -- they identify objects in the image, their relationships, and what story the image is telling. Automatic indexing techniques are currently far from this level of sophistication.

1 - Why study image and video databases?

Image and video databases are an increasingly important type of database as sources of images increase, methods of storage improve, and the Net offers the communication ability. Both still images and video sequences have important characteristics that the database designer must know and understand.

2- Goal of seminar

To gain an appreciation of the special problems of image and video retrieval, to learn about some current systems, to learn about the indexing and database organization techniques, and to get hands-on experience with one of the systems.

3 - Some sources of images

  • Medical imagery (pathology slides, x-ray, NMR, ultrasonic, etc,)
  • News and entertainment videos
  • Educational videos
  • Art and photo collections
  • Consumer and engineering catalogues
  • Scientific images (astronomy, earth resources, etc.)
  • Images collected by intelligence agencies (often sattelite images); video sequences taken by unmanned vehicles
  • Home photos/videos
  • 4 - Types of user requests

    Users may want combinations of the following requests:

  • SIMILARITY: Find an image that looks like this image (or parts of it look like part of this image)
  • OBJECT: Find an image that contains a cat
  • CONDITION/SITUATION: Find photos of water pollution
  • SPECIFIC PERSON: Find a video frame of Clinton talking to Rabin
  • OBJECT RELATIONSHIP: Find an image that contains a cat near a dog
  • MOOD: Find a sad/happy/... picture
  • VIEW ANGLE: Find a picture of a crowd taken from an airplane
  • TIME OF DAY/SEASON: Find a picture of Yosemite taken at day/night/sunset/winter
  • COLOR: Find a picture with a red apple
  • TEXTURE: Find picture with a brick texture
  • SHAPE: Find picture with circular object
  • GEOGRAPHIC: Find aerial image of the port of San Francisco
  • 5 - Approach to image database design

    One could treat an image database as if it were a document database by manually tagging each image with a descriptor. Queries would be boolean combinations of index terms.

  • The user would review the results and modify the query
  • HOWEVER manual indexing is very time-consuming. Also, this approach does not take advantage of the special aspects of an image:

    1. People can review candidate retrieved images very fast, and can quickly indicate which of the search results is closest to the desired one.
    2. An image has many "meanings" depending on the interest or "point of view" of the inexer
    3. 6 - An Ideal image database system:

      1. allows user to review image "thumbnails"
      2. presents image in order of "closeness"
      3. uses a variety of description approaches, many of them automated
      4. allows retrieval using conventional relational, etc. databases
      5. allows search refinement

      7 - Capabilities needed

      1. Similarity metric must match the human idea of similarity of images
      2. Search must be efficient enough to be interactive
      3. User must be able to specify needs without becoming an image or DB expert -- a good approach is for user to specify by providing examples close to what is desired
      4. Image descriptor-finding must be automated

      8 - Problems

      1. How to normalize an image: scale, orientation
      2. Capturing aspects of content by using invariants or discriminants
      3. Can these invariants capture semantic information?
      4. How can features of an image be compared to features of another image?
    4. Efficiency of invariants
    5. Can the title or caption of image (or audio portion of video) aid in finding invariants?