Preuve existe f pas cca equivalant exists f BA pas randomguess

author: Jan Aalmoes <jan.aalmoes@inria.fr> 2024-09-11 00:10:50 +0200
committer: Jan Aalmoes <jan.aalmoes@inria.fr> 2024-09-11 00:10:50 +0200
commit: bf5b05a84e877391fddd1b0a0b752f71ec05e901 (patch)
tree: 149609eeff1d475cd60f398f0e4bfd786c5d281c /synthetic/bck/related.tex
parent: 03556b31409ac5e8b81283d3a6481691c11846d7 (diff)
1 files changed, 38 insertions, 0 deletions
diff --git a/synthetic/bck/related.tex b/synthetic/bck/related.tex
new file mode 100644
index 0000000..20e9b0c
--- /dev/null
+++ b/synthetic/bck/related.tex
@@ -0,0 +1,38 @@
+The literature on the privacy of synthetic data focuses on a different yet related problem.
+In our work, the synthetic data is not released to the public, it is used as a proxy in between the real data and the target model.
+In countrary, the literature uses synthetic data as a way to releas a dataset to third parties.
+The goal of this endeavour is to circomvent legislation on personal data~\cite{bellovin2019privacy}.
+Previous work shows that releasing synthetic data instad of the real data does not protect against reidentification nor attribute linkage~\cite{stadler2020synthetic}.
+
+Bellocin et all.~\cite{bellovin2019privacy} discuss the legal aspect of sharing synthetic data over sharing the real data.
+They come to the conclusion that a court will not allow the disclosuer of synthetic data because numerous examples show that infering private attributes of the real data is possible.
+They hint that using differential privacy may lead to legislation allowing synthetic data release.
+For instance, Ping et all.~\cite{ping2017datasynthesize} use the GreddyBayes algorithm for tabular data in which they introduce differentical privacy. 
+
+%This conclusion transfers to our work because we have shown that using synthetic data to train a model does not full protect againts privacy attack.
+%Datasynthesizer: privacy preserving synthetic datasets~\cite{ping2017datasynthesizer}.
+%Towards improving privacy of synthetic datasets~\cite{kuppa2021towards}.
+%User-Driven Synthetic Dataset Generation with Quantifiable Differential Privacy~\cite{tai2023user}.
+
+
+%Stadler et all~\cite{stadler2020synthetic} focus on releasing to third parties a genertad synthetic dataset instead of the real dataset.
+%In countrary to our work where we consider that the generated synthetic dataset is not released but is used to train a machine learning model. 
+%The study two privacy risks: Reidentification via linkaged and attribute disclosure.
+%Reidentification via linkage is somwhat similar to membership inference attack as this kind of attack aims at inferfing if a data record has been used to generated the synthetic dataset.
+%Attribute disclosure is closer to attribute inference in the sense that an attacker aims to infer sensitive attribute of user's data.
+%The main difference with Stadler et all and our work is that we add in between the synthetic dataset and the attacker a trained machine learning model and the attacker has only a black box acces to this model.
+%In our setup, the synthetic dataset is not directly accessible to the attacker. 
+%The sensitive informations contained in the real dataset are filtred twice: by the generation process and then by the training of the target model.
+%In Stadler et all, the sensitive informations are filterd only by the generation process.
+%
+%Stadler et all show that using synthetic data does not protect user's privacy against neither linkage nor attribute disclosure.
+%Our conclusion is that using a synthetic dataset to train a machine learning model does not protect user's privacy against adversaries with black box access to this model.
+%Hence Stadlr et all and our work are aligned in showing that synthetic datasets are not a guaranted protection to user's personal data.
+
+Jordon et all~\cite{jordon2021hide} state that generativ approaches can be used to hide the membership status.
+Their contribution consists in a data aonymisation challange where with two track.
+The first has to produce an algorithm that generates synthetic data that hides the membership status.
+The second produces an algorithm that infers (i.e. an attack) the membership status using synthetic data generated from the algorithms of the first track.
+Sadly, their results remains inconclusive because the participants of the first track submited their work to closely to the deadline which did not leave enough time for the attacker to develop tailored attacks.
+
+
author	Jan Aalmoes <jan.aalmoes@inria.fr>	2024-09-11 00:10:50 +0200
committer	Jan Aalmoes <jan.aalmoes@inria.fr>	2024-09-11 00:10:50 +0200
commit	bf5b05a84e877391fddd1b0a0b752f71ec05e901 (patch)
tree	149609eeff1d475cd60f398f0e4bfd786c5d281c /synthetic/bck/related.tex
parent	03556b31409ac5e8b81283d3a6481691c11846d7 (diff)