Browsing by Author "BASU, Debabrota"
Now showing 1 - 5 of 5
Results Per Page
Sort Options
- ItemDifferential Privacy for Regularised Linear RegressionDANDEKAR, Ashish; BASU, Debabrota; BRESSAN, StephaneRecent attacks on machine learning models such as membership inference attacks increase the concern for privacy. Linear regression is such an essential statistical machine learning model at risk. For a given dataset, linear regression determines the parameters of the linear equation connecting the predictor variables to the response variable. As such linear regression yields a set of unstable and over tted parameters. Regularisation terms are added to the loss function of linear regression in order to avoid overftting. LASSO, ridge, and elastic net are three variants of regularised linear regression. We present an e-differentially private functional mechanism for the aforementioned variants of regularised linear regression. We empirically and comparatively analyze its effectiveness. A functional mechanism achieves differential privacy for linear regression by adding noise to the loss function. We empirically show that an e-differentially private functional mechanism causes more error than the non-private linear regression models whereas their performances are comparable. We also discuss caveats in the functional mechanism, such as non-convexity of the noisy loss function, which causes instability in the results of di erentially private linear regression models. This discussion puts forth the need of designing a differentially private mechanism that produces a noisy loss function that is convex.
- ItemEvaluation of Di erentially Private Non-parametric Machine Learning as a ServiceDANDEKAR, Ashish; BASU, Debabrota; BRESSAN, StephaneMachine learning algorithms create models from training data for the purpose of estimation, prediction and classi cation. While releasing parametric machine learning models requires the release of the parameters of the model, releasing non-parametric machine learning models requires the release of the training dataset along with the parameters. Indeed, estimation, prediction or classi cation with non-parametric models computes some form of correlation between new data and the training data. The release of the training dataset creates a risk of breach of privacy. An alternative to the release of the training dataset is the presentation of the non-parametric model as a service. Still, the non-parametric model as a service may leak information about the training dataset. We study how to provide di erential privacy guarantees for non-parametric models as a service. This cannot be achieved by perturbation of the output but requires perturbation of the model functions. We show how to apply the perturbation to the model functions of histogram, kernel density estimator, kernel SVM and Gaussian process regression in order to provide ( ; )-di erential privacy. We evaluate the trade-o between the privacy guarantee and the error incurred for each of these non-parametric machine learning algorithms on benchmarks and real-world datasets.Our contribution is twofold. We show that functional perturbation is not only pragmatic for releasing machine learning models as a service but also yields higher e ectiveness than output perturbation mechanisms for speci ed privacy parameters. We show the practical step to perturbate the model functions of histogram, kernel SVM, Gaussian process regression along with kernel density estimator. We evaluate the tradeo between the privacy guarantee and the error incurred for each of these non-parametric machine learning algorithms for a real-world dataset as well as a selection of benchmarks.
- ItemHow to Find the Best Rated Items on a Likert Scale and How Many Ratings Are Enough(2017-06-06T02:11:13Z) LIU, Qing; BASU, Debabrota; GOEL, Shruti; ABDESSALEM, Talel; BRESSANE, StéphaneOne of the modern pillars of collaborative filtering and recommender systems is collection and exploitation of ratings from users. Likert scale is a psychometric quantifi er of ratings popular among the electronic commerce sites. In this paper, we consider the tasks of collecting Likert scale ratings of items and of fi nding the n-k best-rated items, i.e., the n items that are most likely to be the top-k in a ranking constructed from these ratings. We devise an algorithm, Pundit, that computes the n-k best-rated items. Pundit uses the probability-generating function constructed from the Likert scale responses to avoid the combinatorial exploration of the possible outcomes and to compute the result efficiently. We empirically and comparatively evaluate with real data sets and discuss the effectiveness and efficiency of our and competing approaches. Our method is effective and competitively efficient. Selection of the best-rated items meets, in practice, the major obstacle of the scarcity of ratings. We propose an approach that learns from the available data how many ratings are enough to meet a prescribed error and recommends how many additional ratings should be proactively sought. We also empirically evaluate with real data sets the effectiveness of our method to recommend the collection of additional ratings. The results show that the approach is practical and effective.
- ItemLearn-as-you-go with Megh: Efficient Live Migration of Virtual Machines(2017-04-04) BASU, Debabrota; WANG, Xiayang; HONG, Yang; CHEN, Haibo; BRESSAN, StéphaneCloud providers leverage live migration of virtual machines to reduce energy consumption and allocate resources efficiently in data centers. Each migration decision depends on three questions: when to move a virtual machine, which virtual machine to move and where to move it? Dynamic, uncertain and heterogeneous workloads running on virtual machines make such decisions difficult. Knowledge-based and heuristics-based algorithms are commonly used to tackle this problem. Knowledgebased algorithms, such as MaxWeight scheduling algorithms, are dependent on the specifics and the dynamics of the targeted Cloud architectures and applications. Heuristics-based algorithms, such as MMT algorithms, suffer from high variance and poor convergence because of their greedy approach. We propose a reinforcement learning approach. This approach does not require prior knowledge. It learns the dynamics of the workload as-itgoes. We formulate the problem of energy- and performance efficient resource management during live migration as a Markov decision process. While several learning algorithms are proposed to solve this problem, these algorithms remain confined to the academic realm as they face the curse of dimensionality. They are either not scalable in real-time, as it is the case of MadVM, or need an elaborate offline training, as it is the case of Q-learning. We propose an actor-critic algorithm, Megh, to overcome these deficiencies. Megh uses a novel dimensionality reduction scheme to project the combinatorially explosive state-action space to a polynomial dimensional space with a sparse basis. Megh has the capacity to learn uncertain dynamics and the ability to work in real-time. Megh is both scalable and robust. We implement Megh using the CloudSim toolkit and empirically evaluate its performance with the PlanetLab and the Google Cluster workloads. Experiments validate that Megh is more costeffective, incurs smaller execution overhead and is more scalable than MadVM and MMT. We explicate our choice of parameters through a sensitivity analysis.
- ItemTop-k Queries over Uncertain Scores(2016-09-03) LIU, Qing; BASU, Debabrota; ABDESSALEM, Talel; BRESSAN, StephaneModern recommendation systems leverage some forms of collaborative user or crowd sourced collection of information. For instance, services like TripAdvisor, Airbnb and HungyGoWhere rely on user-generated content to describe and classify hotels, vacation rentals and restaurants. By nature of such independent collection of information, the multiplicity, diversity and varying quality of the information collected result in uncertainty. Objects, such as the services o ffered by hotels, vacation rentals and restaurants, have uncertain scores for their various features. In this context, ranking of uncertain data becomes a crucial issue. Several data models for uncertain data and several semantics for probabilistic top-k queries have been proposed in the literature. We consider here a model of objects with uncertain scores given as probability distributions and the semantics proposed by the state of the art reference work of Soliman, Hyas and Ben-David. In this paper, we explore the design space of Metropolis-Hastings Markov chain Monte Carlo algorithms for answering probabilistic top-k queries over a database of objects with uncertain scores. We are able to devise several algorithms that yield better performance than the reference algorithm. We empirically and comparatively prove the eff ectiveness and effi ciency of these new algorithms.