Browsing by Author "DANDEKAR, Ashish"
Now showing 1 - 3 of 3
Results Per Page
Sort Options
- ItemA comparative study of synthetic dataset generation techniquesDANDEKAR, Ashish; ZEN, Remmy A. M.; BRESSAN, StephaneUnrestricted availability of the datasets is important for the researchers to evaluate their strategies to solve the research problems. While publicly releasing the datasets, it is equally important to protect the privacy of the respective data owners. Synthetic datasets that preserve the utility while protecting the privacy of the data owners stands as a midway. There are two ways to synthetically generate the data. Firstly, one can generate a fully synthetic dataset by subsampling it from a synthetically generated population. This technique is known as fully synthetic dataset generation. Secondly, one can generate a partially synthetic dataset by synthesizing the values of sensitive attributes. This technique is known as partially synthetic dataset generation. The datasets generated by these two techniques vary in their utilities as well as in their risks of disclosure. We perform a comparative study of these techniques with the use of different dataset synthesisers such as linear regression, decision tree, random forest and neural network. We evaluate the e ectiveness of these techniques towards the amounts of utility that they preserve and the risks of disclosure that they su er.
- ItemDifferential Privacy for Regularised Linear RegressionDANDEKAR, Ashish; BASU, Debabrota; BRESSAN, StephaneRecent attacks on machine learning models such as membership inference attacks increase the concern for privacy. Linear regression is such an essential statistical machine learning model at risk. For a given dataset, linear regression determines the parameters of the linear equation connecting the predictor variables to the response variable. As such linear regression yields a set of unstable and over tted parameters. Regularisation terms are added to the loss function of linear regression in order to avoid overftting. LASSO, ridge, and elastic net are three variants of regularised linear regression. We present an e-differentially private functional mechanism for the aforementioned variants of regularised linear regression. We empirically and comparatively analyze its effectiveness. A functional mechanism achieves differential privacy for linear regression by adding noise to the loss function. We empirically show that an e-differentially private functional mechanism causes more error than the non-private linear regression models whereas their performances are comparable. We also discuss caveats in the functional mechanism, such as non-convexity of the noisy loss function, which causes instability in the results of di erentially private linear regression models. This discussion puts forth the need of designing a differentially private mechanism that produces a noisy loss function that is convex.
- ItemEvaluation of Di erentially Private Non-parametric Machine Learning as a ServiceDANDEKAR, Ashish; BASU, Debabrota; BRESSAN, StephaneMachine learning algorithms create models from training data for the purpose of estimation, prediction and classi cation. While releasing parametric machine learning models requires the release of the parameters of the model, releasing non-parametric machine learning models requires the release of the training dataset along with the parameters. Indeed, estimation, prediction or classi cation with non-parametric models computes some form of correlation between new data and the training data. The release of the training dataset creates a risk of breach of privacy. An alternative to the release of the training dataset is the presentation of the non-parametric model as a service. Still, the non-parametric model as a service may leak information about the training dataset. We study how to provide di erential privacy guarantees for non-parametric models as a service. This cannot be achieved by perturbation of the output but requires perturbation of the model functions. We show how to apply the perturbation to the model functions of histogram, kernel density estimator, kernel SVM and Gaussian process regression in order to provide ( ; )-di erential privacy. We evaluate the trade-o between the privacy guarantee and the error incurred for each of these non-parametric machine learning algorithms on benchmarks and real-world datasets.Our contribution is twofold. We show that functional perturbation is not only pragmatic for releasing machine learning models as a service but also yields higher e ectiveness than output perturbation mechanisms for speci ed privacy parameters. We show the practical step to perturbate the model functions of histogram, kernel SVM, Gaussian process regression along with kernel density estimator. We evaluate the tradeo between the privacy guarantee and the error incurred for each of these non-parametric machine learning algorithms for a real-world dataset as well as a selection of benchmarks.