Repository logo
  • English
  • Català
  • Čeština
  • Deutsch
  • Español
  • Français
  • Gàidhlig
  • Latviešu
  • Magyar
  • Nederlands
  • Polski
  • Português
  • Português do Brasil
  • Suomi
  • Svenska
  • Türkçe
  • Қазақ
  • বাংলা
  • हिंदी
  • Ελληνικά
  • Yкраї́нська
  • Log In
    or
    New user? Click here to register.Have you forgotten your password?
Repository logo
  • Communities & Collections
  • All of DSpace
  • English
  • Català
  • Čeština
  • Deutsch
  • Español
  • Français
  • Gàidhlig
  • Latviešu
  • Magyar
  • Nederlands
  • Polski
  • Português
  • Português do Brasil
  • Suomi
  • Svenska
  • Türkçe
  • Қазақ
  • বাংলা
  • हिंदी
  • Ελληνικά
  • Yкраї́нська
  • Log In
    or
    New user? Click here to register.Have you forgotten your password?
  1. Home
  2. Browse by Author

Browsing by Author "TUNG, Anthony K. H."

Now showing 1 - 7 of 7
Results Per Page
Sort Options
  • No Thumbnail Available
    Item
    BROAD: Diversified Keyword Search in Databases
    (2011-03-21T08:18:20Z) ZHAO, Feng; ZHANG, Xiaolong; TUNG, Anthony K. H.; CHEN, Gang
    Keyword search in databases has received a lot of attention in the database community as it is an effective approach for querying a database without knowing its underlying schema. However, keyword search queries often return too many results. One standard solution is to rank results such that the “best” results appear first. Still, this approach can suffer from redundancy problem where many high ranking results are in fact coming from the same part of the database and results in other parts of the database are missed completely. In this paper, we propose the BROAD system which allows users to perform diverse, hierarchical browsing on keyword search results. Our system partitions the answer trees in the keyword search results by selecting k diverse representatives from the trees, separating the answer trees into k groups based on their similarity to the representatives and then recursively applying the partitioning for each group. By constructing summarized result for the answer trees in each of the k groups, we provide a way for users to quickly locate the results that they desire. Technically, our solution consists of three components. First, a new distance metric is used to capture both semantic and structural dissimilarity between answer trees. Second, based on this metric, we propose a tree-based algorithm to efficiently achieve result diversification. Finally, by coupling our partitioning solution with result summarization techniques, we allow users to decide which partition to drill down in order to obtain their intended answers. Extensive experiments were conducted and the results validate the feasibility and the efficiency of our system.
  • No Thumbnail Available
    Item
    Efficient Constrained Delaunay Triangulation for Large Spatial Databases
    (2006-01-27) WU, Xinyu; HSU, David; TUNG, Anthony K. H.
    Delaunay Triangulation (DT) and its extension Constrained Delaunay Triangulation (CDT) are spatial data structures that have wide applications in spatial data processing. Our recent survey shows, however, that there is a surprising lack of algorithms for computing DT/CDT for large spatial databases. In view of this, we propose an efficient algorithm based on the divide and conquer paradigm. It computes DT/CDT on in-memory partitions before merging them into the final result. This is made possible by discovering mathematical property that precisely characterizes the set of triangles that are involved in the merging step. Our extensive experiments show that the new algorithm outperforms another provably good disk-based algorithm by roughly an order of magnitude when computing DT. For CDT, which has no known disk-based algorithm, we show that our algorithm scales up well for large databases with size in the range of gigabytes.
  • No Thumbnail Available
    Item
    Finding Time-lagged 3D Clusters
    (2008-06-19) XU, Xin; LU, Ying; TAN, Kian-Lee; TUNG, Anthony K. H.
    Existing 3D clustering algorithms on $gene\times sample\times time$ expression data do not consider the \emph{time lags} between correlated gene expression patterns. Besides, they either ignore the correlation on \emph{time subseries}, or disregard the \emph{continuity} of the time series, or only validate pure shifting or pure scaling coherent patterns instead of the general \emph{shifting-and-scaling patterns}. In this paper, we propose a novel 3D cluster model, $S^2D^3$ Cluster, to address these problems, where $S^2$ reflects the shifting-and-scaling correlation and $D^3$ the 3-Dimensional $gene\times sample\times time$ data. Within the $S^2D^3$ Cluster model, expression levels of genes are shifting-and-scaling coherent in both sample subspace and time subseries with arbitrary time lags. We develop a 3D clustering algorithm, $LagMiner$, for identifying interesting $S^2D^3$ Clusters that satisfy the constraints of regulation ($\gamma$), coherence ($\epsilon$), minimum gene number ($MinG$), minimum sample subspace size ($MinS$) and minimum time periods length ($MinT$). Experimental results on both synthetic and real-life datasets show that $LagMiner$ is effective, scalable and parameter-robust. While we use gene expression data in this paper, our model and algorithm can be applied on any other data where both spatial and temporal coherence are pursued.
  • No Thumbnail Available
    Item
    Generic Inverted Index on the GPU
    (2015-11-23) ZHOU, Jingbo; GUO, Qi; JAGADISH, H. V.; LUAN, Wenhao; TUNG, Anthony K. H.; ZHENG, Yuxin
    Data variety, as one of the three Vs of the Big Data, is manifested by a growing number of complex data types such as documents, sequences, trees, graphs and high dimensional vectors. To perform similarity search on these data, existing works mainly choose to create customized indexes for different data types. Due to the diversity of customized indexes, it is hard to devise a general parallelization strategy to speed up the search. In this paper, we propose a generic inverted index on the GPU (called GENIE), which can support similarity search of multiple queries on various data types. GENIE can effectively support the approximate nearest neighbor search in different similarity measures through exerting Locality Sensitive Hashing schemes, as well as similarity search on original data such as short document data and relational data. Extensive experiments on different reallife datasets demonstrate the efficiency and effectiveness of our system.
  • No Thumbnail Available
    Item
    Large Scale Cohesive Subgraphs Discovery for Social Network Visual Analysis
    (2012-04-03T09:41:45Z) ZHAO, Feng; TUNG, Anthony K. H.
    Graphs are widely used in large scale social network analysis nowadays. Not only analysts need to focus on cohesive subgraphs to study patterns among social actors, but also normal users are interested in discovering what happening in their neighborhood. However, e®ectively storing large scale social network and e±ciently identifying cohesive subgraphs is challenging. In this work we introduce a novel subgraph concept to capture the cohesion in social interactions, and propose an I/O e±cient approach to discover cohesive sub- graphs. Besides, we propose an analytic system which allows users to perform intuitive, visual browsing on large scale social networks. Our system stores the network as a social graph in the graph database, retrieves a local cohesive subgraph based on the input keywords, and then visualizes the sub-graph out on orbital layout, in which more important social actors are located in the center. By summarizing textual interactions between social actors as tag cloud, we provide a way to quickly locate active social communities and their interactions in a uni¯ed view.
  • No Thumbnail Available
    Item
    On Tag-based Querying of Wide Tables
    (2010-01-15) DAI, Bing Tian; LU, Meiyu; OOI, Beng Chin; TUNG, Anthony K. H.
    The huge amount of heterogeneous data that are generated from Web 2.0 applications have resulted in the use of wide tables where tuples with different schemas may co-exist. Providing users with such freedom however poses new challenges such as difficulty in query formulation and query processing, and lack of guarantee in data quality. In this paper, we propose an out-of-the-box approach that deals with challenges from a semantic perspective. We interpret the semantics of each data value by a set of similar tags. Such association between values and tags provides a layer that separates database operations from the database storage layer. This layer also serves as a platform for users to query the databases by a group of tags. We conducted a series of experiments and the results confirm its effectiveness.
  • No Thumbnail Available
    Item
    What is Unequal among the Equals? Ranking Equivalent Rules from Gene Expression Data
    (2009-06-01T08:08:33Z) CAI, Ruichu; TUNG, Anthony K. H.; ZHANG, Zhenjie; HAO, Zhifeng
    In previous studies, association rules have been proven to be useful in classification problems over high dimensional gene expression data. However, due to the nature of such datasets, it is often the case that millions of rules can be derived such that many of them are covered by exactly the same set of training tuples and thus have exactly the same support and confidence. Ranking and selecting useful rules from such equivalent rule groups remain an interesting and unexplored problem. In this paper, we look at two interestingness measures for ranking the interestingness of rules within equivalent rule group: Max-Subrule-Conf and Min-Subrule-Conf. Based on these interestingness measures, an incremental Apriori-like algorithm is designed to select more interesting rules from the lower bound rules of the group. Moreover, we present an improved classification model to fully exploit the potentials of the selected rules. Our empirical studies on our proposed methods over five gene expression datasets show that our proposals improve both the efficiency and effectiveness of the rule extraction and classifier construction over gene expression datasets.

DSpace software copyright © 2002-2025 LYRASIS

  • Cookie settings
  • Privacy policy
  • End User Agreement
  • Send Feedback