ICRA: Effective Semantics for Ranked XML Keyword Search

CHEN, Bo; LU, Jiaheng; LING, Tok Wang

ICRA: Effective Semantics for Ranked XML Keyword Search

Files

TRC5-07.pdf(6.32 MB)

Date

2007-05-30

Authors

CHEN, Bo

LU, Jiaheng

LING, Tok Wang

Abstract

Keyword search is a user-friendly way to query XML databases. Most previous efforts in this area focus on keyword proximity search in XML based on either tree data model or graph (or digraph) data model. Tree data model cannot capture connections such as ID references in XML databases. In the contrast, techniques based on graph (or digraph) data model capture connections, but are generally inefficient to compute. In this paper, we propose interconnected object trees model for keyword search to achieve the efficiency of tree model and meanwhile to capture the connections such as ID references in XML by fully exploiting the property and schema information of XML databases. In particular, we propose ICA (Interested Common Ancestor) semantics to find all predefined interested objects that contain all query keywords. We also introduce novel IRA (Interested Related Ancestors) semantics to capture the conceptual connections between interested objects and include more objects that only contain some query keywords. Then, a novel ranking metric, RelevanceRank, is studied to dynamically assign higher ranks to objects that are more relevant to a given keyword query according to the conceptual connections in IRAs. We design and analyze efficient algorithms for keyword search based on our data model; and experiment results show our approach is efficient and outperforms most existing systems in terms of result quality. A prototype of our ICRA system (ICRA = ICA + IRA) on the updated 321M DBLP data is available at http://xmldb.ddns.comp.nus.edu.sg/.

URI

https://dl.comp.nus.edu.sg/xmlui/handle/1900.100/2553

Collections

Technical Reports

Full item page