CS 545I - Advanced Image Databases, W 95/96

Friday, 19 Jan 1996. Second lecture CS545I

1 Part 1 Oscar Firschein, Visiting Scholar

2 The approach to indexing often depends on the nature of the image database:

General still image and video

Specialized, e.g., faces, parts catalog, maintenance manual

3 - Ideally, we would like to be able to automatically index general images on the basis of meaning or content

If we could automatically obtain a formal description of an image, we could extract index terms from the description.

4 - Unfortunately, automatic content analysis of imagery is extremely difficult

IMAGE COMPLEXITY: A picture is difficult to describe automatically because objects may be complex and occluded by other objects.

SHADOWS, REFLECTIONS, TEXTURES: These can confuse automated systems.

KNOWLEDGE: Knowledge about the subject being pictured is often required to interpret an image.

ILL-POSED PROBLEM: The problem of deducing 3-D objects from a single 2-D image is ill-posed (there are many solutions possible for a given image).

PARALLEL VS. SERIAL PROCESSING: A person normally sees an image "all at once." If the person is restricted to viewing the image through a small window that is moved around over the image, the person's ability to understand the image is reduced significantly because the adjacent context is very important in interpreting an image.
Unfortunately, the computer interprets an image in this limited way. The image is stored as an array of numbers, each representing the grey level or color at a picture location. This array is explored, often by window operations and the results of these window operations must be pieced together to make some sense out of the array.

5- What are the basic automated approaches used to make sense of an unknown image?

Delineate meaningful regions. The segmentation of an image is the division of the image into fragments, or segments, each of which is homogeneous in the same sense. An attempt is made to merge and join homogeneous regions to delineate objects.

Find edges of objects: An edge in an image is an image contour across which the brightness of the image changes abruptly. It can indicate a change in object surface or a depth discontinuity. The trick is to aggregate pieces of edge so as to construct meaningful objects.

Analysis of shading: Determine 3-D shape from shading and texture

Combinations of the above

6 Recognition of objects

Once objects have been delineated in the image, there is still the problem of identifying the objects. Object recognition is a difficult problem requiring that the delineated objects be matched against a database of object models. This is practical when the number of possible objects is small, but is less effective as the number of possible objects becomes large. The bottom line is that the current state of the art in automatic description of arbitrary images is very limited.

7 - Current automatic indexing approaches for general images usually depend on gross measures of the image: color, texture, and simple shape.

The next several lectures will go into the details of such indexing.

We will see that clever system design, particularly in the user interface can overcome many of the limitations of these "non-semantic" approaches.

8 - Dealing with specialized image collections

For a specialized collection of images, if the characteristics of the are known, then automatic indexing may be possible

For a database of faces, we can obtain a description of the typical face, and can describe each image by its differences from the typical object. These differences then form the descriptor vector for the image. (See Pentland reference "Photobook..", p6. Appearance Photobook" for discussion of eigenimage representations).

In retrieval of intelligence imagery, a stored 3-D model, a "site model" can be used as an indexing mechanism, and the stored images can be registered to buildings at the site. Then one can retrieve images that show the roof of a specific building on the site.

For a parts catalog, one may be able to describe the shape of parts in a canonical way that can be used as an index.

For a database of drawings that are labeled with parts annotations, one may be able to automatically pick off part names or i.d. labels for use in an index

9 - In summary, general image databases are indexed using color, texture, and simple shape. Specialized image databases use indexing techniques designed to take advantage of the known characteristics of the collection.

10 Part 2, Dragutin Petkovic, Manager, Advanced Algorithms, Architectures, and Applications

11 BASIC ISSUES IN CONTENT BASED RETRIEVAL

Image representation

Matching of image descriptors with query descriptors

Integration with traditional search and DB (SQL, text)

User Interface

Performance measures (retrieval accuracy, storage, speed)

Network and WWW and other systems issues

Data capture, annotation and indexing

Applications

Extends to video

12 Image representation

Non-image data (places, prices, author etc.) - mostly keyed in

Image descriptors: easy to compute, applicable to a variety of planned and unplanned queries

Image descriptors: color, texture, shape, layout, relationships

Image descriptors can refer to the whole image or to image objects

Image objects can be obtained manually (outlining), automatically (segmentation, object reco) or semiautomatically

Offer SEMANTIC compression (Pentland)

No need to solve full object reco problem (although desirable)

13 Matching

Match one query with very large number of samples in the DB (unlike in CV: one image with 10s of models). Retrieve samples not a "statement" like good/bad.

Needs to be fast, indexable, and to correspond to human perception and expectations

Examples: normalized quadratic functions, nearest neighbors, neural networks.

Issues of color spaces for matching vs. RGB

14 Integration with traditional DB and search methods

Non-image descriptors (keywords, free text, numerics) searched by text search, SQL. Note the difference between keyword searches, and SQL on text/numeric data

SQL produces a SET, content retrieval RANKS the set (does not make a "cut")

Systems aspects: How to integrate with DB in an extensible way? Note IBMs DB2 Extenders, and Illustra's blades.

How to merge ranked lists?

15 User Interface

Needs to combine browse, search, navigation and relevance feedback

Needs to address wide variety of users, mostly non-technical

"Visual" users search differently than others

Constant interplay between narrowing and broadening the search

Fast

.User should not get lost; system should be intuitive with not too many controls

Hard to communicate a variety of ranked results, especially if ranks are combined

WWW issues

16 Performance Measures

Retrieval accuracy: normalized recall and precision (Salton)

Speed (browser, network, indexing, storage, fast BLOB support)

Cost: storage, data capture and indexing costs

17 Network and WWW issues

Networks are slow, and we need fast browse for image data

The slower the browse, the more important content based retrieval

Interactivity on WWW

Needs to stage the search (do the fastest one first)

Access control

18 Data Capture and annotation/indexing

Often overlooked, but key to success. Very expensive to digitize and have people key in the data for 10000000s of images

Data capture: digitization, color accuracy, sizing, cropping etc.

Data annotation indexing: entering the keywords and data; obtaining related data from other sources (other DB, cameras etc.); preprocessing to extract content descriptors, outlining for objects

Issues of people's inconsistency, costs of training

Tightly controlled processes for data input

Meta data (indices) might become more valuable than BLOBS

19 Extensions to Video

Video also needs to be searched for by keywords and data

Most of the image database issues apply here, with important additions:

Browsing video is even more time consuming due to its size

Break video into scenes, represent each scene with data and the keyframe. Or create salient stills to represent the video

Use image content descriptors on the keyframe

Video-specific content descriptors like motion (camera, object) etc. objects

Systems issues even more complex: size of data, QOS etc.