1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
|
\label{sec:bck_fair}
Algorithmic fairness aims at reducing biases in ML model predictions.
Indeed, data records belonging to certain subgroups influence $targetmodel$'s predictions more than others.
For instance in criminal justice, the ethnicity of a culprit plays a non-negligible role in the prediction of them reoffending~\cite{fairjustice}. Generally, data records in the minority subgroup face unfair prediction behaviour compared to data records in the majority subgroup. These subgroups are identified based on a sensitive attribute (e.g., race or sex).
Those biases are learnt by $targetmodel$ as they are part of the distribution of the training dataset.
There is two main categories of fairness of a ML model:
\textbf{Individual fairness} ensures that two data records with same attributes except for $S$ have the same model prediction.
This notion does not dwell on sensitive attribute and as such is not really useful in our goal of mitigating attribute inference attack at inference time.
So we set it aside for the rest of the paper.
\textbf{Group fairness} comes from the idea that different subgroups defined by an attribute such a skin color or gender should be treated equally.
We focus our study on group fairness where $S$ represents either sex or race (i.e., $S(i)$ equals to 0 for woman, 1 for man, and 0 for black, 1 for white, respectively).
There are different definitions of group fairness which have been introduced in prior work.
We discuss two well-established and commonly used metrics: demographic parity and equality of odds.
\begin{definition}
\label{def:dp}
$\hat{Y}$ satisfies demparity for $S$ if and only if: $P(\hat{Y}=0 | S=0) = P(\hat{Y}=0 | S=1)$.
From that, we will call $|P(\hat{Y}=0 | S=0) - P(\hat{Y}=0 | S=1)|$ the demPar-level of $\hat{Y}$.
\end{definition}
demparity is the historical definition of fairness.
Legally, disparate impact is the fairness definition recognized by law, where 80\% disparity is an agreed upon tolerance decided in the legal arena.
demparity ensures that the number of correct prediction is the same for each population.
However, this may result in different false positive and true positive rates if the true outcome does actually vary with $S$~\cite{dpbad}.
Hardt et al.~\cite{fairmetric2} proposed eo as a modification of demparity to ensure that both the true positive rate and false positive rate will be the same for each population.
\begin{definition}
\label{def:eo}
$\hat{Y}$, classifier of $Y$, satisfies equality of odds for $S$ if and only if: $\forall (\hat{y},y)\in\{0,1\}^2 \quad
P(\hat{Y}=\hat{y} | S=0,Y=y) = P(\hat{Y}=\hat{y} | S=1,Y=y)$.
\end{definition}
The above fairness definitions can be achieved using three main fairness mechanisms: (a) pre-processing, (b) in-processing and (c) post-processing. \textit{Pre-processing} algorithms such as reweighing requires access to the training data and assigns weights to the data records to remove discrimination~\cite{preprocessing}.
\textit{In-processing} algorithms such as advdebias~\cite{debiase} and egd~\cite{reductions} add constraint during $targetmodel$'s training to ensure fairness. %reductions
\textit{Post-processing} techniques, in turn, hide the bias in output predictions to satisfy the above fairness constraints but the underlying model is still biased.
Similar to previous work~\cite{chang2021privacy}, we focus on in-processing algorithms.
Our work focuses on the theoretical guaranties on attribute inference attacks given by the different fairness notions and not so much on how to implement in-processing fairness mechanism.
Nevertheless in the experiment section we try production ready state of the art implementations of those fairness constraints along unconstrained ML algorithm.
|