ABSTRACT: Many autonomous and heterogeneous information sources are becoming increasingly available to the user through the Internet -- especially through the World Wide Web. The integration of Internet sources poses several challenges which have not been sufficiently addressed. In particular, knowledge of redundancy can be used to reduce the number of source accesses that have to be performed to retrieve the answer to the user query. Moreover, probabilistic information about source overlap can help derive efficient query plans for delivering partial answers to queries.