- DSpace at NUS School of Computing

### DSpace is Live

Welcome to our digital repository of My University research!

More exciting news to appear here.

### Recent Submissions

One of the modern pillars of collaborative filtering and recommender systems is collection and exploitation of ratings from users. Likert scale is a psychometric quantifi er of ratings popular among the electronic commerce sites. In this paper, we consider the tasks of collecting Likert scale ratings of items and of fi nding the n-k best-rated items, i.e., the n items that are most likely to be the top-k in a ranking constructed from these ratings. We devise an algorithm, Pundit, that compu...

Cloud providers leverage live migration of virtual machines to reduce energy consumption and allocate resources efficiently in data centers. Each migration decision depends on three questions: when to move a virtual machine, which virtual machine to move and where to move it? Dynamic, uncertain and heterogeneous workloads running on virtual machines make such decisions difficult. Knowledge-based and heuristics-based algorithms are commonly used to tackle this problem. Knowledgebased algorithm...

Security is a big concern in widely adopting security critical systems, such as Unmanned Aerial Vehicles (UAV). To ensure security of a system, the rst step is to identify the required security properties as well as the potential attacks, i.e., security requirements. We identify a set of security requirements for UAV systems which is more complete and in more details than existing works. To facilitate formal analysis of a system against the set of requirements, we propose algorithms to autom...

Modern recommendation systems leverage some forms of collaborative user or crowd sourced collection of information. For instance, services like TripAdvisor, Airbnb and HungyGoWhere rely on user-generated content to describe and classify hotels, vacation rentals and restaurants. By nature of such independent collection of information, the multiplicity, diversity and varying quality of the information collected result in uncertainty. Objects, such as the services o ffered by hotels, vacation re...

For dynamic networks with unknown diameter, we prove novel lower bounds on the time complexity of a range of basic distributed computing problems. Together with trivial upper bounds under dynamic networks with known diameter for these problems, our lower bounds show that the complexitiesof all these problems are sensitive to whether the diameter is known to the protocol beforehand: Not knowing the diameter increases the time complexities by a large poly(N) factor as compared to when the diame...

Data variety, as one of the three Vs of the Big Data, is manifested by a growing number of complex data types such as documents, sequences, trees, graphs and high dimensional vectors. To perform similarity search on these data, existing works mainly choose to create customized indexes for different data types. Due to the diversity of customized indexes, it is hard to devise a general parallelization strategy to speed up the search. In this paper, we propose a generic inverted index on the GPU...

Keyword search over relational databases has gained popularity as it provides a user-friendly way to explore structured data. Current research has focused on the computation of minimal units that contain all the query keywords, and largely ignores queries to retrieve statistical information from the database. The latter involves aggregate functions and group-bys, and are called aggregate queries. In this work, we propose a semantic approach to answer keyword queries containing aggregates and ...

In a crowdsourcing market, a requester is looking to form a team of workers to perform a complex task that requires a variety of skills. Candidate workers advertise their certified skills and bid prices for their participation. We design four incentive mechanisms for selecting workers to form a valid team (that can complete the task) and determining each individual worker’s payment.We examine the profitability, individually rationality, computationally efficiency, and truthfulness for each of...

Cooperative statistical bug isolation approaches monitor program runtime behavior from end-user executions and apply statistical techniques to pinpoint likely causes of eld failures. Having end-users to run fully instrumented programs is extremely undesirable, as it adds tremendous performance overhead to end-users; it is also unnecessary, since in practice major part of these programs are error-free and need no instrumentation. In order to reduce the monitoring and computational costs as wel...

Partial learning is a criterion where the learner in nitely often outputs one correct conjecture while every other hypothesis is issued only nitely often. This paper addresses two variants of partial learning in the setting of inductive inference of functions: rst, con dent partial learning requires that the learner also on those functions which it does not learn, singles out exactly one hypothesis which is output in nitely often; second, essentially class consistent partial learning is par...

The present work investigates the relationship of iterative learning with other learning criteria such as decisiveness, caution, reliability, non-U-shapedness, monotonicity, strong monotonicity and conservativeness. Building on the result of Case and Moelius that iterative learners can be made non-U-shaped, we show that they also can be made cautious and decisive. Furthermore, we obtain various special results with respect to one-one texts, fat texts and one-one hypothesis spaces.

XML keyword search has attracted a lot of interests with typical search based on lowest common ancestor (LCA). However, in this paper, we show that meaningful answers can be found beyond LCA and should be independent from schema designs of the same data content. Therefore, we propose a new semantics, called CR (Common Relative), which not only can find more answers beyond LCA, but the returned answers are independent from schema designs as well. To find answers based on the CR semantics, we p...

This paper considers the problem of computing general commutative and associative aggregate functions (such as SUM) over distributed inputs held by nodes in a distributed system, while tolerating failures. Specifically, there are N nodes in the system, and the topology among them is modeled as a general undirected graph. Whenever a node sends a message, the message is received by all of its neighbors in the graph. Each node has an input, and the goal is for a special root node (e.g., the base...

Automated techniques have been proposed to either identify refactoring opportunities (i.e., code fragments that can, but have not yet been restructured in a program), or reconstruct historical refactoring (i.e., code restructuring operations that have happened between di erent versions of a program). However, it remains challenging to apply those techniques to large code bases containing millions of lines of code involving many versions. In this paper, we propose a new scalable technique that...

Preface This volume contains the papers presented at the Doctoral Symposium which was held in Singapore on 13 May 2014 as part of the International Symposium on Formal Methods (FM2014). The Doctoral Symposium is an important event where students have the opportunity to present their ideas at an international forum. The papers selected for this year’s symposium were reviewed carefully by the Programme Committee members: · Franck Cassez (NICTA, Sydney Australia) · Wei-Ngan Chin (National Univ o...

While price and data quality should define the major tradeoff for consumers in data markets, prices are usually prescribed by vendors and data quality is not negotiable. In this paper we study a model where data quality can be traded for a discount. We focus on the case of XML documents and consider completeness as the quality dimension. In our setting, the data provider offers an XML document, and sets both the price of the document and a weight to each node of the document, depending on its...

A probabilistic relational database is a compact form of a set of deterministic relational databases (namely, possible worlds), each of which has a probability. In our framework, the existence of tuples is determined by associated Boolean formulae based on elementary events. An estimation, within such a setting, of the probabilities of possible worlds uses a prior probability distribution speci ed over the elementary events. Direct observations and general knowledge, in the form of constraint...

Coarse-grained reconfigurable arrays (CGRAs) exhibit high performance, improved flexibility, low cost, and power efficiency for various application domains. Computeintensive loop kernels are mapped onto CGRAs through modified modulo scheduling algorithms that integrate placement and routing. We formalize the CGRA mapping problem as a graph minor containment problem. We essentially test if the data flow graph representing the loop kernel is a minor of the modulo routing resource graph represen...

A probabilistic database instance denotes a set of possible non-probabilistic database instances called possible worlds, each of which has a probability. This is often a compact way to represent uncertain data. In addition, direct observations and general knowledge, in the form of constraints, help refining the probabilities of the possible worlds, possibly ruling out some of them. Adding such constraints to the set of possible worlds with their probabilities is conditioning in the probabilis...

In database applications in general, and in applications using XML in particular, the availability of a conceptual schema or of elements of semantics constitute invaluable leverage for improving the effectiveness, and sometimes the efficiency, of many tasks including query processing, keyword search and schema and data integration. The Object-Relationship-Attribute model for Semi-Structured data (ORA-SS) model is a conceptual model designed to capture the semantics of the important constructs...

## This is a default installation of DSpace!

It can be extensively configured by installing modified JSPs, and altering the site configuration.