diff options
Diffstat (limited to 'background/conf.tex')
-rw-r--r-- | background/conf.tex | 101 |
1 files changed, 101 insertions, 0 deletions
diff --git a/background/conf.tex b/background/conf.tex new file mode 100644 index 0000000..4ee8d9f --- /dev/null +++ b/background/conf.tex @@ -0,0 +1,101 @@ + +%Attacks which violate privacy and confidentiality in ML infer potentially sensitive unobservable information from observable information (e.g., model predictions). +\label{sec:bck_aia} + +Attacks which violate privacy and confidentiality in ML infer potentially sensitive information from observable information (e.g., model predictions). +This leakage of information is a privacy risk if adv learns something about $traindata$ -or the inputs- which would be impossible to learn without access to $targetmodel$. This differentiates between a privacy risk and simple statistical inference~\cite{cormode}. +Among the various privacy risks explored in literature pertaining to ML models, attribute inference attacks~\cite{fredrikson2,Mahajan2020DoesLS,yeom,Song2020Overlearning,malekzadeh2021honestbutcurious,MehnazAttInf} infer the specific value of a sensitive attribute for a specific input to ML model given some model observables (e.g., model predictions, parameters, intermediate layerwise outputs) and background information. Based on attack surface being exploited, aia{s} can be categorized into (a) imputation-based attacks and (b) representation-based attacks. + +Let's introduce some notations to guide us in understanding the zoology of those attacks. + +We have a dataset $d:I\rightarrow \mathcal{X}\times\mathcal{S}\times\mathcal{Y}$ containing as column: the features, the sensitive attribute and the ground truth. +$I$ is a finite set of indices. +To access features, sensitive attribute and labels from there indices, we define respectively the following functions: +\begin{itemize} + \item $X:I\rightarrow \mathcal{X},~i\mapsto (d(i))_0$ + \item $S:I\rightarrow \mathcal{S},~i\mapsto (d(i))_1$ + \item $Y:I\rightarrow \mathcal{Y},~i\mapsto (d(i))_2$ +\end{itemize} +Let $(I_0,I_1)$ be a partition of $I$. +$d$ is split in two datasets $d_0 = d_{{|I_0}}$ and $d_1 = d_{{|I_1}}$ which we call respectively the target dataset and the auxiliary dataset. +$d_0$ is used to train a machine learning model to infer the ground truth from the features: we call it the target model $targetmodel$. + +Regarding attribute inference attack, we differentiate between training time attacks that target $d_0$: the dataset used in training. +And inference time attack that target data used as input of an already trained target model. +Our work focuses on the later (see figure \ref{fig:tm2}) but for clear positioning of our contributions, we are going to present both types of attack in this background section. + +\noindent\textbf{\underline{Imputation-based attacks}} assume adv has access to non-sensitive attributes in addition to model's predictions and background information (e.g., marginal prior over sensitive attribute and confusion matrix). We review these different imputation-based attacks below: + + + +\setlength\tabcolsep{3pt} +\begin{table*}[!htb] +\caption{Comparison of prior work based on: attack surface exploited (e.g., model predictions ($targetmodel(X(i))$), $X(i)$, $Y(i)$, distribution over $S(i)$ ($P_S$) and confusion matrix between true and predicted output across all training data records ($C(Y(i),targetmodel(X(i)))$), whether $S(i)$ is censored, i.e., included in $traindata$ or inputs, whether they account for class imbalance in $S(i)$, whether adv is active or passive and whether the threat model is blackbox or whitebox. All the attacks assume the knowledge of auxiliary data $auxdata$.} +\begin{center} +\footnotesize +\begin{tabular}{ |c|c|c|c|c|c| } + \hline + \textbf{Literature} & \textbf{Attack Vector} & \textbf{$S$ is censored?} & \textbf{Imbalance in $S$?} & \textbf{adv} & \textbf{Threat Model} \\ + \hline + \multicolumn{6}{|c|}{\textbf{Imputation-based Attacks}}\\ + \hline + \textbf{Fredrikson et al.}~\cite{fredrikson2} & $X$, $Y$, $targetmodel(X(i))$, \textbf{$P_S$}, $C(Y(i),targetmodel(X(i)))$ & $\checkmark$ & $\times$ & Passive & Blackbox\\ + \textbf{Yeom et al.}~\cite{yeom} & $X(i)$, $Y(i)$, $targetmodel()$, \textbf{$P_S$} & $\checkmark$ & $\times$ & Passive & Blackbox\\ + \textbf{Mehnaz et al.}~\cite{MehnazAttInf} & $X(i)$, $Y(i)$, $targetmodel()$, \textbf{$P_S$}, $C(Y(i),targetmodel(X(i)))$ & $\checkmark$ & $\times$ & Passive & Blackbox\\ + \textbf{Jayaraman and Evans}~\cite{jayaraman2022attribute} & $X(i)$, $Y(i)$, $targetmodel()$, \textbf{$P_S$}, $C(Y(i),targetmodel(X(i)))$ & $\times$, $\checkmark$ & $\times$ & Passive & Whitebox\\ + \hline + \multicolumn{6}{|c|}{\textbf{Representation-based Attacks}}\\ + \hline + \textbf{Song et al.}~\cite{Song2020Overlearning} & $targetmodel(X(i))$ & $\times$ & $\times$ & Passive & Both\\ + \textbf{Mahajan et al.}~\cite{Mahajan2020DoesLS} & $targetmodel(X(i))$ & $\checkmark$ & $\times$ & Passive & Blackbox\\ + \textbf{Malekzadeh et al.}~\cite{malekzadeh2021honestbutcurious} & $targetmodel(X(i))$ & $\times$ & $\times$ & Active & Blackbox\\ + \textbf{Our Work} & $targetmodel(X(i))$ & $\times$, $\checkmark$ & $\checkmark$ & Passive & Blackbox \\ + \hline +\end{tabular} +\end{center} +\label{tab:summary} +\end{table*} + +\label{sec:bck_aia} + +\begin{itemize} + \item \textbf{Fredrikson et al.~\cite{fredrikson2}} assumes that adv has access to $targetmodel(X(i))$. + For this attack it is required that $X$ can be written $X(i) = (\cdots,S(i),\cdots)$. + We will refer to this case as "\textit{S is in the input}". + Fredrikson et al. attack generates an input with different possible values of the sensitive attribute + It then chooses the most likely value based on $targetmodel(X(i))$. + + \item \noindent\textbf{Yeom et al.~\cite{yeom}} assumes a distribution $P_S$ over $S$ which is used to estimate the value of $S$ for an arbitrary data record. They propose three different variants of AS based on assumptions on $P_S$: Attack 1 leverages membership oracle to determine the value of $S(i)$ and Attack 2 and 3 assume different types of distributions over $S$. + For this attack to work, $S$ is in the input and the data points being attacked belong to the target dataset + + \item \textbf{Mehnaz et al.~\cite{MehnazAttInf}} improves upon Fredrikson et al.~\cite{fredrikson1,fredrikson2} by exploiting $targetmodel\circ X$ and $X$, with $S$ in the input. The attack relies on the intuition that $targetmodel$'s output confidence is higher when the input has the correct value of $S$ as $targetmodel$ encountered the target record with that attribute during training. Their attack involves generating multiple instances of input with different values of $S(i)$ (similar to Fredrikson et al.~\cite{fredrikson1,fredrikson2}) and identifying the most likely value of $S$. +\end{itemize} + +An appropriate baseline to identify whether such attacks are indeed a privacy risk is to use data imputation, i.e., train an ML model to infer value of missing attribute from other non-sensitive attributes without $targetmodel(X(i))$~\cite{jayaraman2022attribute}. Jayaraman and Evans~\cite{jayaraman2022attribute} find that existing blackbox imputation-based attacks~\cite{yeom,fredrikson2,MehnazAttInf} do not perform any better than data imputation. In other words, the perceived privacy risk is actually stemming from statistical inference and hence not an actual privacy risk. + +To address this, Jayaraman and Evans~\cite{jayaraman2022attribute} propose a whitebox aia which outperforms prior blackbox attacks as well as data imputation in the setting where there is limited knowledge of data for adv. However, since the attack is in a whitebox setting, we omit a detailed description of the attack. All these attacks require that: + +\begin{itemize} + \item $S$ is in the input data records which is not always the case in realistic settings, + \item $X(i)$ being attacked belong to the target dataset. +\end{itemize} + +\noindent\textbf{\underline{Representation-based attacks}} exploit the distinguishable intermediate layer outputs or predictions for different values of sensitive attributes~\cite{Song2020Overlearning,Mahajan2020DoesLS,malekzadeh2021honestbutcurious}. For instance, the distribution of $targetmodel\circ X$ for \textit{males} is different from the output prediction distribution for \textit{females}. We describe the existing attacks of this category below: + +\begin{itemize} +\item \textbf{Song et al.~\cite{Song2020Overlearning} / Mahajan et al.~\cite{Mahajan2020DoesLS}} assume that $S$ is not in the input. adv only observes $targetmodel\circ X$. adv trains an ML attack model $ackmodel$ to map the output predictions $targetmodel(X(i))$ to $S(i)$. +In other words, the statistic $\hat{S}$ used to infer $S$ is of the form: $ \hat{S} = 1_{[0.5,1]}\circ ackmodel\circ targetmodel\circ X$, where $attackmodel: [0,1]\rightarrow[0,1]$. + + +\item \textbf{Malekzadeh et al.~\cite{malekzadeh2021honestbutcurious}} considers the setting where adv trains $targetmodel$ with a special loss function to explicitly encode information about $S(i)$ in $targetmodel(X(i))$. +It makes it easier to extract the sensitive attribute during inference. In this setting, the model builder is malicious and actively introduces a ``backdoor''. +\end{itemize} + +Our work focuses on representation-based aia in a blackbox setting at inference time. We focus on Song et al.~\cite{Song2020Overlearning} and Mahajan et al.~\cite{Mahajan2020DoesLS} as our baselines. +These attacks do not account for class imbalance in sensitive attribute commonly present in data from real-world applications which could effect adv's attack success~\cite{classIMb1,classIMb2}. +In our evaluation, we consider an aia using an adaptive threshold which outperforms these baselines attacks (Section~\ref{sec:evalAIA}). +Malekzadeh et al.~\cite{malekzadeh2021honestbutcurious} has a different threat model where adv explicitly modifies the training to enhance the leakage of $S$. +We do not assume such access to $targetmodel$ in a blackbox setting. +In addition, these attacks did not take into consideration the possibility to infer the sensitive attribute solely from the hard labels. +We summarize relevant prior work in Table~\ref{tab:summary}. + |