background/proba.tex


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46


La théorie des probability est profondément liée au machine learning.
Les propriétés de modèles comme la confidentialité différencielle, les définitions d'équitée, les métriques d'utilité, etc. que nous aborderons en Section~\ref{sec:background-ml} s'ecrivent en terme de probabilité.
Ainsi nous présentons les notions de probabitlié et de théorie d la mesure que nous allons utiliser.
A la manière de la Section~\ref{sec:background-set}, notre présentation à principalement le but de fixer les objets que nous utiliserons dans les prochaines sections et nous pas d'être un cours complet. 
Si le lecteur souhaite en apprendre plus sur la theorie de la mesur nous le renvoyons vers les notes de cours de Thierry Gallay de l'université Joseph Fourrier~\cite{mesure}.
Si il souhait explorer plus en avant les probabilités il poura consulter les notes de cour de Jean-François Le Gall de l'Ecole Normale Supérieur de Paris~\cite{proba}.

Soit $A$ un ensemble.
Nous appelons une tribue que nous notons $\mathcal{A}$ un sous esemble de $\mathcal{P}(A)$ qui contien $\emptyset$ et $A$, qui est stable par complémentaire et qui est stable par union d'un nombre dénombrable d'elements de $\mathcal{A}$.
Nous disons que $(A,\mathcal{A})$ est un espace mesurable.

Nous appelons mesure, une fonction $d$ :$\mathcal{A}$ $\rightarrow$ $[0,+\infty]$ telle que $d(\emptyset) = 0$ et $d\left(\bigcup_{i\in \mathbb{N}} A_i\right) = \sum_{i\in \mathbb{N}}d(A_i)$ pour tout $(A_1, A_2, \cdots) \in \mathcal{A}^\mathbb{N} $ avec $\forall (i,j) A_i\cap A_j = \emptyset$.
Nous disons alors que $(A, \mathcal{A}, d)$ est un espace mesuré.

Nous appelons fonction mesurable, une fonction de $A$ à $B$ telle que  $\forall b\in\mathcal{B}$~$f^{-1}(b)\in\mathcal{A}$.
Nous notons alors $f:(A, \mathcal{A})\rightarrow (B, \mathcal{B})$ ou $f:(A, \mathcal{A},d)\rightarrow (B, \mathcal{B})$

Dans le cas particulier où $d(A) = 1$, nous appelons $d$ une mesure de probabilité.
 $(A,\mathcal{A},d)$ est alors un espace probailisé et les fonctions mesurables sur cet espace sont appelés variables aléatoires.
Le loi de probabilité d'une variable aléatoire $f$ sur $(X,\mathcal{X})$ est la mesure de probabilite suivante :
$d_X :\mathcal{X}\rightarrow [0,1]$, $x\mapsto d(X^{-1}(x))$.

Having introduced probability theory, we explicit the relation with the ML theory described previously.
Let $I$ a finite set, $\mathcal{X}$, $\mathcal{S}$ and $\mathcal{Y}$ the sets of features, sensitive attribute and label.
Let $d:I\rightarrow \mathcal{X}\times\mathcal{S}\times\mathcal{Y}$ a dataset.
Let $\#$ be the measure on $(I,\mathcal{P}(I))$ which maps to every $a$ in $\mathcal{P}(I)$ the number of elements of $a$.
Let $P:\mathcal{P}(I)\rightarrow [0,1]$, $a\mapsto \frac{\#(a)}{\#(I)}$.
Then $(I, \mathcal{P}(I), P)$ is a probability space.
On this space we can define the following random variables:
\begin{itemize}
    \item $X:I\rightarrow \mathcal{X},~i\mapsto (d(i))_0$
    \item $S:I\rightarrow \mathcal{S},~i\mapsto (d(i))_1$
    \item $Y:I\rightarrow \mathcal{Y},~i\mapsto (d(i))_2$
\end{itemize}
Where for a vector $u$, $u_j$ refers to the $j$th element of $u$.

From there we can define various random variables that will be useful in the rest of the paper.
For instance $\hat{Y}=f\circ X$ is random variable that represents the prediction of a trained machine learning model $f$. 
We can use it to write the accuracy in a compact way: $P(\hat{Y}=Y)$ by using the well accepted abuse of notations that for a random variable $A$ and an event $a$, 
$\{A\in a\} = \{i\in\mathcal{P}(I)~|~A(i)\in a\} = A^{-1}(a)$.
The accuracy is a reliable metric of a trained model's utility when $P(Y=0) = P(Y=1) = \frac{1}{2}$ but not so much when there is unbalance in $Y$. 
To take into account an eventual unbalanced distribution of the labels, we will consider the balanced accuracy : 
$\frac{P(\hat{Y}=0|Y=0) + P(\hat{Y}=1|Y=1)}{2}$.

Finally in the context of attribute inference attack at inference time, we define the random variable $\hat{S}=a\circ \hat{Y}$ where here $a$ is a machine learning model trained to infer sensitive attribute from model's output.