BROAD: Diversified Keyword Search in Databases
No Thumbnail Available
Files
Date
2011-03-21T08:18:20Z
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Keyword search in databases has received a lot of attention in the database community as it is an effective approach for querying a database without knowing its underlying schema. However, keyword search queries often return too many results. One standard solution is to rank results such that the “best” results appear first. Still, this approach can suffer from
redundancy problem where many high ranking results are in fact coming from the same part of the database and results in other parts of the database are missed completely.
In this paper, we propose the BROAD system which allows users to perform diverse, hierarchical browsing on keyword search results. Our system partitions the answer trees in the keyword search results by selecting k diverse representatives from the trees, separating the answer trees into k groups based on their similarity to the representatives and then recursively applying the partitioning for each group. By constructing summarized result for the answer trees in each of the k groups, we
provide a way for users to quickly locate the results that they desire.
Technically, our solution consists of three components. First, a new distance metric is used to capture both semantic and structural dissimilarity between answer trees. Second, based
on this metric, we propose a tree-based algorithm to efficiently achieve result diversification. Finally, by coupling our partitioning solution with result summarization techniques, we allow users to decide which partition to drill down in order to obtain their intended answers. Extensive experiments were
conducted and the results validate the feasibility and the efficiency of our system.