- DSpace at NUS School of Computing

### DSpace is Live

Welcome to our digital repository of My University research!

More exciting news to appear here.

### Recent Submissions

Recent attacks on machine learning models such as membership inference attacks increase the concern for privacy. Linear regression is such an essential statistical machine learning model at risk. For a given dataset, linear regression determines the parameters of the linear equation connecting the predictor variables to the response variable. As such linear regression yields a set of unstable and over tted parameters. Regularisation terms are added to the loss function of linear regression in...

Unrestricted availability of the datasets is important for the researchers to evaluate their strategies to solve the research problems. While publicly releasing the datasets, it is equally important to protect the privacy of the respective data owners. Synthetic datasets that preserve the utility while protecting the privacy of the data owners stands as a midway. There are two ways to synthetically generate the data. Firstly, one can generate a fully synthetic dataset by subsampling it fro...

Many real world applications need to capture a mix of temporal and non-temporal entities, relationships and attributes. These concepts add complexity when designing database schemas and existing works are unable to capture the temporal semantics precisely. We propose a new framework for designing SQL databases that distinguishes between temporal and non-temporal concepts while also distinguishing between entities, relationships and attributes at every step. The framework rst utilizes an Ent...

Querying temporal relational databases is a challenge for non-expert database users, since it requires users to understand the semantics of the database and apply temporal joins as well as temporal conditions correctly in SQL statements. Traditional keyword search approaches are not directly applicable to temporal relational databases since they treat time-related keywords as tuple values and do not consider the temporal joins between relations, which leads to missing answers, incorrect answe...

In a sybil attack, an adversary creates a large number of fake identities/nodes and have them join the system. Computational puzzles have long been investigated as a possible sybil defense: If a node fails to solve the puzzle in a timely fashion, it will no longer be accepted by other nodes. However, it is still possible for a malicious node to behave in such a way that it is accepted by some honest nodes but not other honest nodes. This results in different honest nodes having different view...

This paper considers several closely-related problems in synchronous dynamic networks with oblivious adversaries, and proves novel (d + poly(m)) lower bounds on their time complexity (in terms of rounds). Here d is the dynamic diameter of the dynamic network andmis the total number of nodes. Before this work, the only known lower bounds on these problems under oblivious adversaries were the trivial (d) lower bounds. Our novel lower bounds are hence the first non-trivial lower bounds and also ...

One of the modern pillars of collaborative filtering and recommender systems is collection and exploitation of ratings from users. Likert scale is a psychometric quantifi er of ratings popular among the electronic commerce sites. In this paper, we consider the tasks of collecting Likert scale ratings of items and of fi nding the n-k best-rated items, i.e., the n items that are most likely to be the top-k in a ranking constructed from these ratings. We devise an algorithm, Pundit, that compu...

Cloud providers leverage live migration of virtual machines to reduce energy consumption and allocate resources efficiently in data centers. Each migration decision depends on three questions: when to move a virtual machine, which virtual machine to move and where to move it? Dynamic, uncertain and heterogeneous workloads running on virtual machines make such decisions difficult. Knowledge-based and heuristics-based algorithms are commonly used to tackle this problem. Knowledgebased algorithm...

Security is a big concern in widely adopting security critical systems, such as Unmanned Aerial Vehicles (UAV). To ensure security of a system, the rst step is to identify the required security properties as well as the potential attacks, i.e., security requirements. We identify a set of security requirements for UAV systems which is more complete and in more details than existing works. To facilitate formal analysis of a system against the set of requirements, we propose algorithms to autom...

Modern recommendation systems leverage some forms of collaborative user or crowd sourced collection of information. For instance, services like TripAdvisor, Airbnb and HungyGoWhere rely on user-generated content to describe and classify hotels, vacation rentals and restaurants. By nature of such independent collection of information, the multiplicity, diversity and varying quality of the information collected result in uncertainty. Objects, such as the services o ffered by hotels, vacation re...

For dynamic networks with unknown diameter, we prove novel lower bounds on the time complexity of a range of basic distributed computing problems. Together with trivial upper bounds under dynamic networks with known diameter for these problems, our lower bounds show that the complexitiesof all these problems are sensitive to whether the diameter is known to the protocol beforehand: Not knowing the diameter increases the time complexities by a large poly(N) factor as compared to when the diame...

Data variety, as one of the three Vs of the Big Data, is manifested by a growing number of complex data types such as documents, sequences, trees, graphs and high dimensional vectors. To perform similarity search on these data, existing works mainly choose to create customized indexes for different data types. Due to the diversity of customized indexes, it is hard to devise a general parallelization strategy to speed up the search. In this paper, we propose a generic inverted index on the GPU...

Keyword search over relational databases has gained popularity as it provides a user-friendly way to explore structured data. Current research has focused on the computation of minimal units that contain all the query keywords, and largely ignores queries to retrieve statistical information from the database. The latter involves aggregate functions and group-bys, and are called aggregate queries. In this work, we propose a semantic approach to answer keyword queries containing aggregates and ...

In a crowdsourcing market, a requester is looking to form a team of workers to perform a complex task that requires a variety of skills. Candidate workers advertise their certified skills and bid prices for their participation. We design four incentive mechanisms for selecting workers to form a valid team (that can complete the task) and determining each individual workerâ€™s payment.We examine the profitability, individually rationality, computationally efficiency, and truthfulness for each of...

Cooperative statistical bug isolation approaches monitor program runtime behavior from end-user executions and apply statistical techniques to pinpoint likely causes of eld failures. Having end-users to run fully instrumented programs is extremely undesirable, as it adds tremendous performance overhead to end-users; it is also unnecessary, since in practice major part of these programs are error-free and need no instrumentation. In order to reduce the monitoring and computational costs as wel...

Partial learning is a criterion where the learner in nitely often outputs one correct conjecture while every other hypothesis is issued only nitely often. This paper addresses two variants of partial learning in the setting of inductive inference of functions: rst, con dent partial learning requires that the learner also on those functions which it does not learn, singles out exactly one hypothesis which is output in nitely often; second, essentially class consistent partial learning is par...

The present work investigates the relationship of iterative learning with other learning criteria such as decisiveness, caution, reliability, non-U-shapedness, monotonicity, strong monotonicity and conservativeness. Building on the result of Case and Moelius that iterative learners can be made non-U-shaped, we show that they also can be made cautious and decisive. Furthermore, we obtain various special results with respect to one-one texts, fat texts and one-one hypothesis spaces.

XML keyword search has attracted a lot of interests with typical search based on lowest common ancestor (LCA). However, in this paper, we show that meaningful answers can be found beyond LCA and should be independent from schema designs of the same data content. Therefore, we propose a new semantics, called CR (Common Relative), which not only can find more answers beyond LCA, but the returned answers are independent from schema designs as well. To find answers based on the CR semantics, we p...

This paper considers the problem of computing general commutative and associative aggregate functions (such as SUM) over distributed inputs held by nodes in a distributed system, while tolerating failures. Specifically, there are N nodes in the system, and the topology among them is modeled as a general undirected graph. Whenever a node sends a message, the message is received by all of its neighbors in the graph. Each node has an input, and the goal is for a special root node (e.g., the base...

Automated techniques have been proposed to either identify refactoring opportunities (i.e., code fragments that can, but have not yet been restructured in a program), or reconstruct historical refactoring (i.e., code restructuring operations that have happened between di erent versions of a program). However, it remains challenging to apply those techniques to large code bases containing millions of lines of code involving many versions. In this paper, we propose a new scalable technique that...

## This is a default installation of DSpace!

It can be extensively configured by installing modified JSPs, and altering the site configuration.