Privacy Preserving Indexing of Documents on the Network

Mayank Bawa, Roberto J. Bayardo Jr. and Rakesh Agrawal


We address the problem of providing privacy-preserving search over distributed access-controlled content. Indexed documents can be easily reconstructed from conventional (inverted) indexes used in search. The need to avoid breaches of access-control through the index requires the index hosting site to be fully secured and trusted by by all participating content providers. This level of trust is impractical in the increasingly common case where multiple competing organizations or individuals wish to selectively share content. We propose a solution that eliminates the need of such a trusted authority. The solution builds a centralized privacy-preserving index in conjunction with a distributed access-control enforcing search protocol. The new index provides strong and quantifiable privacy guarantees that hold even if the entire index is made public. Experiments on a real-life dataset validate performance of the scheme. The appeal of our solution is two-fold: (a) Content providers maintain complete control in defining access groups and ensuring its compliance, and (b) System implementors retain tunable knobs to balance privacy and efficiency concerns for their particular domains.