1 files changed, 101 insertions, 0 deletions
diff --git a/background/conf.tex b/background/conf.tex
new file mode 100644
index 0000000..4ee8d9f
--- /dev/null
+++ b/background/conf.tex
@@ -0,0 +1,101 @@
+
+%Attacks which violate privacy and confidentiality in ML infer potentially sensitive unobservable information from observable information (e.g., model predictions). 
+\label{sec:bck_aia}
+
+Attacks which violate privacy and confidentiality in ML infer potentially sensitive information from observable information (e.g., model predictions). 
+This leakage of information is a privacy risk if adv learns something about $traindata$ -or the inputs- which would be impossible to learn without access to $targetmodel$. This differentiates between a privacy risk and simple statistical inference~\cite{cormode}. 
+Among the various privacy risks explored in literature pertaining to ML models, attribute inference attacks~\cite{fredrikson2,Mahajan2020DoesLS,yeom,Song2020Overlearning,malekzadeh2021honestbutcurious,MehnazAttInf} infer the specific value of a sensitive attribute for a specific input to ML model given some model observables (e.g., model predictions, parameters, intermediate layerwise outputs) and background information. Based on attack surface being exploited, aia{s} can be categorized into (a) imputation-based attacks and (b) representation-based attacks. 
+
+Let's introduce some notations to guide us in understanding the zoology of those attacks. 
+
+We have a dataset $d:I\rightarrow \mathcal{X}\times\mathcal{S}\times\mathcal{Y}$ containing as column: the features, the sensitive attribute and the ground truth.
+$I$ is a finite set of indices. 
+To access features, sensitive attribute and labels from there indices, we define respectively the following functions:
+\begin{itemize}
+    \item $X:I\rightarrow \mathcal{X},~i\mapsto (d(i))_0$
+    \item $S:I\rightarrow \mathcal{S},~i\mapsto (d(i))_1$
+    \item $Y:I\rightarrow \mathcal{Y},~i\mapsto (d(i))_2$
+\end{itemize}
+Let $(I_0,I_1)$ be a partition of $I$.
+$d$ is split in two datasets $d_0 = d_{{|I_0}}$ and $d_1 = d_{{|I_1}}$ which we call respectively the target dataset and the auxiliary dataset.
+$d_0$ is used to train a machine learning model to infer the ground truth from the features: we call it the target model $targetmodel$.
+
+Regarding attribute inference attack, we differentiate between training time attacks that target $d_0$: the dataset used in training.
+And inference time attack that target data used as input of an already trained target model.
+Our work focuses on the later (see figure \ref{fig:tm2}) but for clear positioning of our contributions, we are going to present both types of attack in this background section.
+
+\noindent\textbf{\underline{Imputation-based attacks}} assume adv has access to non-sensitive attributes in addition to model's predictions and background information (e.g., marginal prior over sensitive attribute and confusion matrix). We review these different imputation-based attacks below:
+
+
+
+\setlength\tabcolsep{3pt}
+\begin{table*}[!htb]
+\caption{Comparison of prior work based on: attack surface exploited (e.g., model predictions ($targetmodel(X(i))$), $X(i)$, $Y(i)$, distribution over $S(i)$ ($P_S$) and confusion matrix between true and predicted output across all training data records ($C(Y(i),targetmodel(X(i)))$), whether $S(i)$ is censored, i.e., included in $traindata$ or inputs, whether they account for class imbalance in $S(i)$, whether adv is active or passive and whether the threat model is blackbox or whitebox. All the attacks assume the knowledge of auxiliary data $auxdata$.}
+\begin{center}
+\footnotesize
+\begin{tabular}{ |c|c|c|c|c|c| }
+ \hline
+ \textbf{Literature} & \textbf{Attack Vector} & \textbf{$S$ is censored?} & \textbf{Imbalance in $S$?} & \textbf{adv} & \textbf{Threat Model} \\
+ \hline
+ \multicolumn{6}{|c|}{\textbf{Imputation-based Attacks}}\\
+ \hline
+ \textbf{Fredrikson et al.}~\cite{fredrikson2} & $X$, $Y$, $targetmodel(X(i))$, \textbf{$P_S$}, $C(Y(i),targetmodel(X(i)))$ & $\checkmark$ & $\times$ &  Passive & Blackbox\\
+ \textbf{Yeom et al.}~\cite{yeom} & $X(i)$, $Y(i)$, $targetmodel()$, \textbf{$P_S$} & $\checkmark$  & $\times$ &  Passive & Blackbox\\
+  \textbf{Mehnaz et al.}~\cite{MehnazAttInf} & $X(i)$, $Y(i)$, $targetmodel()$, \textbf{$P_S$}, $C(Y(i),targetmodel(X(i)))$ & $\checkmark$ & $\times$ &  Passive & Blackbox\\
+  \textbf{Jayaraman and Evans}~\cite{jayaraman2022attribute} & $X(i)$, $Y(i)$, $targetmodel()$, \textbf{$P_S$}, $C(Y(i),targetmodel(X(i)))$ & $\times$, $\checkmark$ & $\times$ &  Passive & Whitebox\\
+  \hline
+   \multicolumn{6}{|c|}{\textbf{Representation-based Attacks}}\\
+  \hline
+ \textbf{Song et al.}~\cite{Song2020Overlearning} & $targetmodel(X(i))$ & $\times$ & $\times$ &  Passive & Both\\
+ \textbf{Mahajan et al.}~\cite{Mahajan2020DoesLS} & $targetmodel(X(i))$ & $\checkmark$ & $\times$ &  Passive & Blackbox\\
+ \textbf{Malekzadeh et al.}~\cite{malekzadeh2021honestbutcurious} & $targetmodel(X(i))$ & $\times$ & $\times$ &  Active & Blackbox\\
+ \textbf{Our Work} & $targetmodel(X(i))$ & $\times$, $\checkmark$ & $\checkmark$ &  Passive & Blackbox \\
+ \hline
+\end{tabular}
+\end{center}
+\label{tab:summary}
+\end{table*}
+
+\label{sec:bck_aia}
+
+\begin{itemize}
+    \item \textbf{Fredrikson et al.~\cite{fredrikson2}} assumes that adv has access to $targetmodel(X(i))$. 
+    For this attack it is required that $X$ can be written $X(i) = (\cdots,S(i),\cdots)$.
+    We will refer to this case as "\textit{S is in the input}".
+    Fredrikson et al. attack generates an input with different possible values of the sensitive attribute
+    It then chooses the most likely value based on $targetmodel(X(i))$.
+
+    \item \noindent\textbf{Yeom et al.~\cite{yeom}} assumes a distribution $P_S$ over $S$ which is used to estimate the value of $S$ for an arbitrary data record. They propose three different variants of AS based on assumptions on $P_S$: Attack 1 leverages membership oracle to determine the value of $S(i)$ and Attack 2 and 3 assume different types of distributions over $S$.
+    For this attack to work, $S$ is in the input and the data points being attacked belong to the target dataset 
+    
+    \item \textbf{Mehnaz et al.~\cite{MehnazAttInf}} improves upon Fredrikson et al.~\cite{fredrikson1,fredrikson2} by exploiting $targetmodel\circ X$ and $X$, with $S$ in the input. The attack relies on the intuition that $targetmodel$'s output confidence is higher when the input has the correct value of $S$ as $targetmodel$ encountered the target record with that attribute during training. Their attack involves generating multiple instances of input with different values of $S(i)$ (similar to Fredrikson et al.~\cite{fredrikson1,fredrikson2}) and identifying the most likely value of $S$.
+\end{itemize}
+
+An appropriate baseline to identify whether such attacks are indeed a privacy risk is to use data imputation, i.e., train an ML model to infer value of missing attribute from other non-sensitive attributes without $targetmodel(X(i))$~\cite{jayaraman2022attribute}. Jayaraman and Evans~\cite{jayaraman2022attribute} find that existing blackbox imputation-based attacks~\cite{yeom,fredrikson2,MehnazAttInf} do not perform any better than data imputation. In other words, the perceived privacy risk is actually stemming from statistical inference and hence not an actual privacy risk.
+
+To address this, Jayaraman and Evans~\cite{jayaraman2022attribute} propose a whitebox aia which outperforms prior blackbox attacks as well as data imputation in the setting where there is limited knowledge of data for adv. However, since the attack is in a whitebox setting, we omit a detailed description of the attack. All these attacks require that: 
+
+\begin{itemize}
+    \item $S$ is in the input data records which is not always the case in realistic settings,
+    \item $X(i)$ being attacked belong to the target dataset.
+\end{itemize}
+
+\noindent\textbf{\underline{Representation-based attacks}} exploit the distinguishable intermediate layer outputs or predictions for different values of sensitive attributes~\cite{Song2020Overlearning,Mahajan2020DoesLS,malekzadeh2021honestbutcurious}. For instance, the distribution of $targetmodel\circ X$ for \textit{males} is different from the output prediction distribution for \textit{females}. We describe the existing attacks of this category below:
+
+\begin{itemize}
+\item \textbf{Song et al.~\cite{Song2020Overlearning} / Mahajan et al.~\cite{Mahajan2020DoesLS}} assume that $S$ is not in the input. adv only observes $targetmodel\circ X$. adv trains an ML attack model $ackmodel$ to map the output predictions $targetmodel(X(i))$ to $S(i)$.
+In other words, the statistic $\hat{S}$ used to infer $S$ is of the form: $ \hat{S} = 1_{[0.5,1]}\circ ackmodel\circ targetmodel\circ X$, where $attackmodel: [0,1]\rightarrow[0,1]$.
+
+
+\item \textbf{Malekzadeh et al.~\cite{malekzadeh2021honestbutcurious}} considers the setting where adv trains $targetmodel$ with a special loss function to explicitly encode information about $S(i)$ in $targetmodel(X(i))$.
+It makes it easier to extract the sensitive attribute during inference. In this setting, the model builder is malicious and actively introduces a ``backdoor''.
+\end{itemize}
+
+Our work focuses on representation-based aia in a blackbox setting at inference time. We focus on Song et al.~\cite{Song2020Overlearning} and Mahajan et al.~\cite{Mahajan2020DoesLS} as our baselines. 
+These attacks do not account for class imbalance in sensitive attribute commonly present in data from real-world applications which could effect adv's attack success~\cite{classIMb1,classIMb2}.
+In our evaluation, we consider an aia using an adaptive threshold which outperforms these baselines attacks (Section~\ref{sec:evalAIA}).
+Malekzadeh et al.~\cite{malekzadeh2021honestbutcurious} has a different threat model where adv explicitly modifies the training to enhance the leakage of $S$. 
+We do not assume such access to $targetmodel$ in a blackbox setting. 
+In addition, these attacks did not take into consideration the possibility to infer the sensitive attribute solely from the hard labels.
+We summarize relevant prior work in Table~\ref{tab:summary}.
+