School of Computing Digital Librarygeneral-feed.descriptionhttps://dl.comp.nus.edu.sg2023-12-03T21:28:36Z2023-12-03T21:28:36Z4171Using Multi-dimensional Quorums for Optimal Resilience in Multi-resource BlockchainsYucheng SunRuomu HaoHaifeng Yuhttps://dl.comp.nus.edu.sg/handle/1900.100/155212023-11-28T07:28:24Z2023-09-01T00:00:00Zdc.title: Using Multi-dimensional Quorums for Optimal Resilience in Multi-resource Blockchains
dc.contributor.author: Yucheng Sun; Ruomu Hao; Haifeng Yu
dc.description.abstract: Permissionless blockchains commonly use resource challenges to defend against sybil attacks. For example, popular resource challenge designs include Proof-of-Work and Proof-of-Stake. It is well-known that simultaneously exploiting multiple resources can help make a permissionless blockchain more robust. Existing efforts along this direction, however, all fail to provide a complete answer to the central question of how to combine PoW and PoS, or multiple resources in general, to achieve optimal resilience. As our central contribution, this work gives complete answers to this central question.
2023-09-01T00:00:00ZOptimistic Fast Confirmation While Tolerating Malicious Majority in BlockchainsHou, RuomuYu, Haifenghttps://dl.comp.nus.edu.sg/handle/1900.100/137562023-04-18T08:02:26Z2023-04-06T00:00:00Zdc.title: Optimistic Fast Confirmation While Tolerating Malicious Majority in Blockchains
dc.contributor.author: Hou, Ruomu; Yu, Haifeng
dc.description.abstract: The robustness of a blockchain against the adversary is often characterized by the maximum fraction (fmax) of adversarial power that it can tolerate. While most existing blockchains can only tolerate fmax < 0.5 or lower, there are some blockchain systems that are able to tolerate a malicious majority, namely fmax >= 0.5. A key price paid by such blockchains, however, is their large confirmation latency. This work aims to significantly reduce the confirmation latency in such blockchains, under the common case where the actual fraction f of adversarial power is relatively small. To this end, we propose a novel blockchain called FLINT. FLINT tolerates fmax >= 0.5 and can give optimistic execution (i.e., fast confirmation) whenever f is relatively small. Our experiments show that the fast confirmation in FLINT only takes a few minutes, as compared to several hours of confirmation latency in prior works.
2023-04-06T00:00:00ZAchieving Sublinear Complexity under Constant 𝑇 in 𝑇 -interval Dynamic NetworksHou, RuomuJahja, IrvanSun, YuchengWu, JiyanYu, Haifenghttps://dl.comp.nus.edu.sg/handle/1900.100/117272022-06-02T02:53:31Z2022-05-26T00:00:00Zdc.title: Achieving Sublinear Complexity under Constant 𝑇 in 𝑇 -interval Dynamic Networks
dc.contributor.author: Hou, Ruomu; Jahja, Irvan; Sun, Yucheng; Wu, Jiyan; Yu, Haifeng
dc.description.abstract: This paper considers standard {\em $T$-interval dynamic networks}, where the $N$ nodes in the network proceed in lock-step {\em rounds}, and where the topology of the network can change arbitrarily from round to round, as determined by an {\em adversary}. The adversary promises that in every $T$ consecutive rounds, the $T$ (potentially different) topologies in those $T$ rounds contain a common connected subgraph that spans all nodes.
Within such a context, we propose novel algorithms for solving some fundamental distributed computing problems such as Count/Consensus/Max.
Our algorithms are the first algorithms whose complexities do not contain an $\Omega(N)$ term, under constant $T$ values. Previous sublinear algorithms require significantly larger $T$ values.
2022-05-26T00:00:00ZSurvey on Data Quality and ProvenanceSchmitz, Martinhttps://dl.comp.nus.edu.sg/handle/1900.100/110992021-11-26T08:56:09Z2021-11-01T00:00:00Zdc.title: Survey on Data Quality and Provenance
dc.contributor.author: Schmitz, Martin
dc.description.abstract: This technical report summarizes research on data quality, provenance and truth discovery from the last decades. It examines opportunities to use machine learning methods to enhance data quality and provenance. This report can serve as a starting point to nd the key publications of the topics "provenance" and "data quality" and to do further research in those areas in general as well as in combination with machine learning algorithms.
2021-11-01T00:00:00ZTemporal Keyword Search with Aggregates and Group-ByGao, QiaoLee, Mong LiLing, Tok Wanghttps://dl.comp.nus.edu.sg/handle/1900.100/101842021-08-05T04:21:59Z2021-07-01T00:00:00Zdc.title: Temporal Keyword Search with Aggregates and Group-By
dc.contributor.author: Gao, Qiao; Lee, Mong Li; Ling, Tok Wang
dc.description.abstract: Abstract. Temporal keyword search enables non-expert users to query temporal relational databases with time conditions. However, aggregates and group-by are currently not supported in temporal keyword search, which hinders querying of statistical information in temporal databases. This work proposes a framework to support aggregate, group-by and time condition in temporal keyword search. We observe that simply combining non-temporal keyword search with aggregates, group-by, and temporal aggregate operators may lead to incorrect and meaningless results as a result of data duplication over time periods. As such, our framework utilizes Object-Relationship-Attribute semantics to identify a unique at-tribute set in the join sequence relation and remove data duplicates from this attribute set to ensure the correctness of aggregate and group-by computation. We also consider the time period in which temporal at-tributes occur when computing aggregate to return meaningful results. Experiment results demonstrate the importance of these steps to retrieve correct results for keyword queries over temporal databases.
2021-07-01T00:00:00ZOn the Power of Randomization in Distributed Algorithms in Dynamic Networks with Adaptive Adversaries*Jahja, IrvanYu, HaifengHou, Ruomuhttps://dl.comp.nus.edu.sg/handle/1900.100/91152020-10-05T01:48:54Z2020-09-16T00:00:00Zdc.title: On the Power of Randomization in Distributed Algorithms in Dynamic Networks with Adaptive Adversaries*
dc.contributor.author: Jahja, Irvan; Yu, Haifeng; Hou, Ruomu
dc.description.abstract: This paper investigates the power of randomization in general distributed algorithms in dynamic networks where the network’s topology may evolve over time, as determined by some adaptive adversary. In such a context, randomization may help algorithms to better deal with i) “bad” inputs to the algorithm, and ii) evolving topologies generated by “bad” adaptive adversaries. We prove that randomness offers limited power to better deal with “bad” adaptive adversary. We define a simple notion of prophetic adversary for determining the evolving topologies. Such an adversary accurately predicts all randomness in the algorithm beforehand, and hence the randomness will be useless against “bad” prophetic adversaries. Given a randomized algorithm P whose time complexity satisfies some mild conditions, we prove that P can always be converted to a new algorithm Q with comparable time complexity, even when Q runs against prophetic adversaries. This
2020-09-16T00:00:00ZSublinear Algorithms in T -interval Dynamic NetworksJahja, IrvanYu, Haifenghttps://dl.comp.nus.edu.sg/handle/1900.100/84592020-06-23T04:34:13Z2020-05-20T00:00:00Zdc.title: Sublinear Algorithms in T -interval Dynamic Networks
dc.contributor.author: Jahja, Irvan; Yu, Haifeng
dc.description.abstract: We consider standard T -interval dynamic networks, under the synchronous timing model and the broadcast CONGEST model. In a T -interval dynamic network, the set of nodes is always fixed and there are no node failures. The edges in the network are always undirected, but the set of edges in the topology may change arbitrarily from round to round, as determined by some adversary and subject to the following constraint: For every T consecutive rounds, the topologies in those rounds must contain a common connected spanning subgraph. Let Hr to be the maximum (in terms of number of edges) such subgraph for round r through r +T − 1. We define the backbone diameter d of a T -interval dynamic network to be the maximum diameter of all such Hr ’s, for r ≥ 1. We use n to denote the number of nodes in the network.
Within such a context, we consider a range of fundamental distributed computing problems including zount/Max/Median/Sum/LeaderElect/Consensus/ConfirmedFlood. Existing algorithms for these problems all have time complexity of Ω(n) rounds, even for T = ∞ and even when d is as small as O(1). This paper presents a novel O(d3 log2 n) deterministic algorithm for computing Count, for T -interval dynamic networks with T ≥ c · d2 log2 n. Here c is a (sufficiently large) constant independent of d, n, and T . To our knowledge, our algorithm is the very first such algorithm whose complexity does not contain a Θ(n) term. For d = O(na ) with constant a < 13, our deterministic algorithm has o(n) complexity, which is better than all (both randomized and deterministic) existing Count algorithms in this setting. For d = O(polylog(n)), our algorithm is exponentially faster. Following the frameworkof our Count algorithm, this paper further develops novel algorithms for solving Max/Median/Sum/LeaderElect/ Consensus/ ConfirmedFlood, while incurring either O(d3 log2 n) or O(d3 log3 n) complexity. Again, for all these problems, our algorithms are the first ones whose time complexity does not contain a Θ(n) term.
2020-05-20T00:00:00ZEvaluation of Di erentially Private Non-parametric Machine Learning as a ServiceDANDEKAR, AshishBASU, DebabrotaBRESSAN, Stephanehttps://dl.comp.nus.edu.sg/handle/1900.100/74912019-04-09T02:44:54Zdc.title: Evaluation of Di erentially Private Non-parametric Machine Learning as a Service
dc.contributor.author: DANDEKAR, Ashish; BASU, Debabrota; BRESSAN, Stephane
dc.description.abstract: Machine learning algorithms create models from training data for the purpose of estimation, prediction and classi cation. While releasing parametric machine learning models requires the release of the parameters of the model, releasing non-parametric machine learning models requires the release of the training dataset along with the parameters. Indeed, estimation, prediction or classi cation with non-parametric models
computes some form of correlation between new data and the training data. The release of the training dataset creates a risk of breach of privacy. An alternative to the release of the training dataset is the presentation of the non-parametric model as a service. Still, the non-parametric model as a service may leak information about the training dataset.
We study how to provide di erential privacy guarantees for non-parametric models as a service. This cannot be achieved by perturbation of the output but requires perturbation of the model functions. We show how to apply the perturbation to the model functions of histogram, kernel density estimator, kernel SVM and Gaussian process regression in order to provide ( ; )-di erential privacy. We evaluate the trade-o between the privacy guarantee and the error incurred for each of these non-parametric machine learning algorithms on benchmarks and real-world datasets.Our contribution is twofold. We show that functional perturbation is not only pragmatic for releasing machine learning models as a service but also yields higher e ectiveness than output perturbation mechanisms
for speci ed privacy parameters. We show the practical step to perturbate the model functions of histogram, kernel SVM, Gaussian process regression along with kernel density estimator. We evaluate the tradeo between the privacy guarantee and the error incurred for each of these non-parametric machine learning algorithms for a real-world dataset as well as a selection of benchmarks.
Differential Privacy for Regularised Linear RegressionDANDEKAR, AshishBASU, DebabrotaBRESSAN, Stephanehttps://dl.comp.nus.edu.sg/handle/1900.100/70512018-06-07T03:57:12Zdc.title: Differential Privacy for Regularised Linear Regression
dc.contributor.author: DANDEKAR, Ashish; BASU, Debabrota; BRESSAN, Stephane
dc.description.abstract: Recent attacks on machine learning models such as membership inference attacks increase the concern for privacy. Linear regression is such an essential statistical machine learning model at risk. For a given dataset, linear regression determines the parameters of the linear equation connecting the predictor variables to the response variable. As such linear regression yields a set of unstable and over tted parameters. Regularisation terms are added to the loss function of linear regression in order to avoid overftting. LASSO, ridge, and elastic net are three variants of regularised linear regression. We present an e-differentially private functional mechanism for the aforementioned variants of regularised linear regression. We empirically and comparatively analyze its effectiveness. A functional mechanism achieves differential privacy for linear regression by adding noise to the loss function. We empirically show that an e-differentially private functional mechanism causes more error than the non-private linear regression models whereas their performances are comparable. We also discuss caveats in the functional mechanism, such as non-convexity of the noisy loss function, which causes instability in the results of di erentially private linear regression models. This discussion puts forth the need of designing a differentially private mechanism that produces a noisy loss function that is convex.
A comparative study of synthetic dataset generation techniquesDANDEKAR, AshishZEN, Remmy A. M.BRESSAN, Stephanehttps://dl.comp.nus.edu.sg/handle/1900.100/70502018-06-07T03:46:55Zdc.title: A comparative study of synthetic dataset generation techniques
dc.contributor.author: DANDEKAR, Ashish; ZEN, Remmy A. M.; BRESSAN, Stephane
dc.description.abstract: Unrestricted availability of the datasets is important for the researchers to evaluate their strategies to solve the research problems. While publicly releasing the datasets, it is equally important to protect the privacy of the respective data owners. Synthetic datasets that preserve the utility while protecting the privacy of the data owners stands as a midway.
There are two ways to synthetically generate the data. Firstly, one can generate a fully synthetic dataset by subsampling it from a synthetically generated population. This technique is known as fully synthetic dataset generation. Secondly, one can generate a partially synthetic dataset by synthesizing the values of sensitive attributes. This technique is known as
partially synthetic dataset generation. The datasets generated by these two techniques vary in their utilities as well as in their risks of disclosure. We perform a comparative study of these techniques with the use of different dataset synthesisers such as linear regression, decision tree, random forest and neural network. We evaluate the e ectiveness of these
techniques towards the amounts of utility that they preserve and the risks of disclosure that they su er.