summaryrefslogtreecommitdiff
path: root/synthetic/results.tex
diff options
context:
space:
mode:
Diffstat (limited to 'synthetic/results.tex')
-rw-r--r--synthetic/results.tex54
1 files changed, 54 insertions, 0 deletions
diff --git a/synthetic/results.tex b/synthetic/results.tex
new file mode 100644
index 0000000..ec3149a
--- /dev/null
+++ b/synthetic/results.tex
@@ -0,0 +1,54 @@
+In this section we analyse the impact of using synthetic data instead of real data on MIA and AIA.
+Section~\ref{sec:uti} presents the utility of the target.
+This control factor allows us to assess that every model has learned some level of information and is not random guessing the label.
+
+
+\subsection{Utility}
+\label{sec:uti}
+
+\begin{figure}
+ \centering
+ \includegraphics[width=0.45\textwidth]{synthetic/figure/result/adult/utility.pdf}
+ \caption{Utility of the target model in terms of balanced accuracy evaluated on unseen data.
+ The "Real" label refers to a generator equal to identity, hence the synthetic data used to train the target model is the real data.
+ The "Synthetic" label refers to a CGAN generator, hence the synthetic data are sampled according to a distribution learned by the generator model.
+ In this case the target model is not trained on real date.}
+ \label{fig:utility}
+\end{figure}
+Using synthetic dataset degrades the utility of the predictor.
+We present the balanced accuracy for both synthetic and real data in Figure~\ref{fig:utility}.
+
+Using synthetic data degrades significatively the utility of the target model by 5\% with an anova p-value of $1.23\times 10^{-5}$.
+But with a minimum of 0.68 of balanced accuracy on synthetic data, we argue that the target model has learned a level of information that gives a meaningful result in terms of AIA and MIA.
+
+\subsection{Membership inference attack}
+\begin{figure}
+ \centering
+ \includegraphics[width=0.45\textwidth]{synthetic/figure/result/adult/mia.pdf}
+ \caption{Success of the MIA in terms of balanced accuracy evaluated on the Train part of MIA dataset.}
+\end{figure}
+We observe a degradation of the balanced accuracy of the MIA of 30\% on average.
+An anova p-value of $4.54\times 10^{-12}$ indicates the this difference is significative.
+In addition we observe that using synthetic data over real data results in drop of balanced accuracy from 0.86 to 0.55.
+We conclude that using synthetic data protects significantly the membership status of the majority of data records.
+
+But this result does not mean that the membership status is protected.
+The remaining 5\% left is due to outliers in the dataset that can be identified by an attacker~\cite{carlini2022membershipinferenceattacksprinciples}.
+
+\subsection{Attribute inference attack}
+\begin{figure}
+ \centering
+ \includegraphics[width=0.45\textwidth]{synthetic/figure/result/adult/aia.pdf}
+ \caption{Success of the AIA in terms of balanced accuracy evaluated on the Train part of AIA dataset.
+ The AIA dataset is made of points that have not been seen during training of the target model.
+ The target model does not use the sensitive attribute.}
+
+ \label{fig:aia}
+\end{figure}
+Using synthetic dataset does not have an impact on the success of attribute inference attack.
+We present in Figure~\ref{fig:aia} a comparison of AIA between real and synthetic data.
+
+With an anova p-value of $8.65\times 10^{-1}$ we observe that whether we use synthetic or real data does not impact attribute privacy inference.
+In addition, with an attack balanced accuracy ranging from 0.52 to 0.54, we observe a slight but certain risk for attribute leakage.
+Hence, we conclude that using synthetic data does not protect users against AIA.
+