diff options
Diffstat (limited to 'synthetic/results.tex')
-rw-r--r-- | synthetic/results.tex | 54 |
1 files changed, 54 insertions, 0 deletions
diff --git a/synthetic/results.tex b/synthetic/results.tex new file mode 100644 index 0000000..ec3149a --- /dev/null +++ b/synthetic/results.tex @@ -0,0 +1,54 @@ +In this section we analyse the impact of using synthetic data instead of real data on MIA and AIA. +Section~\ref{sec:uti} presents the utility of the target. +This control factor allows us to assess that every model has learned some level of information and is not random guessing the label. + + +\subsection{Utility} +\label{sec:uti} + +\begin{figure} + \centering + \includegraphics[width=0.45\textwidth]{synthetic/figure/result/adult/utility.pdf} + \caption{Utility of the target model in terms of balanced accuracy evaluated on unseen data. + The "Real" label refers to a generator equal to identity, hence the synthetic data used to train the target model is the real data. + The "Synthetic" label refers to a CGAN generator, hence the synthetic data are sampled according to a distribution learned by the generator model. + In this case the target model is not trained on real date.} + \label{fig:utility} +\end{figure} +Using synthetic dataset degrades the utility of the predictor. +We present the balanced accuracy for both synthetic and real data in Figure~\ref{fig:utility}. + +Using synthetic data degrades significatively the utility of the target model by 5\% with an anova p-value of $1.23\times 10^{-5}$. +But with a minimum of 0.68 of balanced accuracy on synthetic data, we argue that the target model has learned a level of information that gives a meaningful result in terms of AIA and MIA. + +\subsection{Membership inference attack} +\begin{figure} + \centering + \includegraphics[width=0.45\textwidth]{synthetic/figure/result/adult/mia.pdf} + \caption{Success of the MIA in terms of balanced accuracy evaluated on the Train part of MIA dataset.} +\end{figure} +We observe a degradation of the balanced accuracy of the MIA of 30\% on average. +An anova p-value of $4.54\times 10^{-12}$ indicates the this difference is significative. +In addition we observe that using synthetic data over real data results in drop of balanced accuracy from 0.86 to 0.55. +We conclude that using synthetic data protects significantly the membership status of the majority of data records. + +But this result does not mean that the membership status is protected. +The remaining 5\% left is due to outliers in the dataset that can be identified by an attacker~\cite{carlini2022membershipinferenceattacksprinciples}. + +\subsection{Attribute inference attack} +\begin{figure} + \centering + \includegraphics[width=0.45\textwidth]{synthetic/figure/result/adult/aia.pdf} + \caption{Success of the AIA in terms of balanced accuracy evaluated on the Train part of AIA dataset. + The AIA dataset is made of points that have not been seen during training of the target model. + The target model does not use the sensitive attribute.} + + \label{fig:aia} +\end{figure} +Using synthetic dataset does not have an impact on the success of attribute inference attack. +We present in Figure~\ref{fig:aia} a comparison of AIA between real and synthetic data. + +With an anova p-value of $8.65\times 10^{-1}$ we observe that whether we use synthetic or real data does not impact attribute privacy inference. +In addition, with an attack balanced accuracy ranging from 0.52 to 0.54, we observe a slight but certain risk for attribute leakage. +Hence, we conclude that using synthetic data does not protect users against AIA. + |