Fini du background, ne manque plus que la relecture

author: Jan Aalmoes <jan.aalmoes@inria.fr> 2024-09-23 20:54:42 +0200
committer: Jan Aalmoes <jan.aalmoes@inria.fr> 2024-09-23 20:54:42 +0200
commit: 5362d389988afb2750d27e7e6c5d401571dfba6e (patch)
tree: 381d52745fb20e54cdb16cd140e5ffea7d0ac76d
parent: caa9990c96141450f62a8076f560761f517dc884 (diff)
6 files changed, 285 insertions, 132 deletions
diff --git a/background/conf.tex b/background/conf.tex
index 4ee8d9f..52ae9b9 100644
--- a/background/conf.tex
+++ b/background/conf.tex
@@ -1,101 +1,90 @@
 
 %Attacks which violate privacy and confidentiality in ML infer potentially sensitive unobservable information from observable information (e.g., model predictions). 
 \label{sec:bck_aia}
+Dans ce manuscrit nous considrons deux types de risques pour la confidentialité.
+Le premier concernce les données qui on servi à l'entraînement du modèle, le second concerne les donnée sui son utilisé lors de l'évaluation.
+Dans le cadre d'attaques nous parlerons de \emph{modèle cible} opour désigner le modèle utilisé par un adversaire pour apprendre des information confidentielles.
+
+\FloatBarrier
+\subsubsection{Risque sur les données d'entraîenemnt}
+L'attaque d'inférence d'apartenance (MIA) consiste à inférer si une donnée a servi à l'entraîenemnt du modèle cible. 
+Cette attaque utilise le fait que le modèles d'apprentissage automatique ont en générale une moins bonne performance sur les donnée qui n'ont pas été utilisé à l'entraînement, c'est le sur-ajustement\footnote{\textit{Overfitting}}~\cite{hawkins2004problem,ying2019overview}.
+Ce problème peut survenir principalement quand le modèle cible est trop complexe par rapport à la tâche qui lui est demandé.
+Pour reprendre les mots de Hawkisn et al. : \textquote{Le sur-ajustement est l'utilisation de modèles ou de procédure qui vont à l'encontre de la parsimonie, c'est-à-dire qui utilisent plus de termes ou qui utilise des approches plus complexes que ce qui est necessaitre}
+\footnote{\textit{Overfitting is the use of models or procedures that violate
+parsimonysthat is, that include more terms than are neces-
+sary or use more complicated approaches than are necessary.}}
+Nous voyons sur la Figure~\ref{fig:background-conf-mia} l'écart entre la valeur de fonction de cout évalué sur les données d'entraînement et d'évaluation.
+Le lien est assez claire, un écart significatif indique qu'un classifieur va être capable d'apprandre quel donnée à été utilisé pour l'entraînement.
+Pour vérifer cela, la Sous-figure~\ref{sfig:background-conf-mia-ba} montre comment une forêt aléatoire à put apprendre cette distinction.
+On observe une exactitude équilibrée autour de 0,625 indiquant une fuite du confidentialité.
+
+\begin{figure}
+    \centering
+    \begin{subfigure}{0.3\linewidth}
+        \centering
+        \includegraphics[width=\linewidth]{background/figure/conf/mia_ba.pdf}
+        \caption{Résulat de l'attaque MIA.}
+        \label{sfig:background-conf-mia-ba}
+    \end{subfigure}
+    \begin{subfigure}{0.65\linewidth}
+        \centering
+        \includegraphics[width=\linewidth]{background/figure/conf/mia.pdf}
+        \caption{Ecart entre le coût calculer sur les données d'entraînemnt et sur les données d'évaluation.}
+    \end{subfigure}
+    \caption{Lien entre sur-ajustement et succès de l'attque MIA.}
+    \label{fig:background-conf-mia}
+\end{figure}
+
+L'étude de la fonction de cout est une possible quand l'adversaire possède des donnée pour lequelles il sait qu'elle ont apartenu à l'entraîenement.
+Grace à cela il peut construir un classifieur un utilisant cette conaissance comme étiquette.
+Si ce n'est pas le cas, l'adversaire utilise des modèles mirroires\footnote{\textit{Shadow models}} qui simulent le modèle cible est permettent d'apprendre à différencier le cout d'une donéne ayant servit à l'entraîenment d'une donnée jamais observé~\cite{shokri2017membership}.
+Un modèle d'attaque de MIA peut ensuite être utilser comme base pour d'autre type d'attaque comme par exemple reconstruir un attribut sensible de données ayanat servit à l'entraînement~\cite{yeom}.
+
+La confidentialité diférentielle\footnote{\textit{Differential privacy}} permet d'empêcher les attaque MIA~\cite{}.
+\begin{definition}{Confidentiatlié diferetielle}
+    Soit $(\Omega,\mathcal{T},P)$ un espace probabilisé.
+    Soit $(S,\mathcal{S})$ un espace mesurable et $\mathcal{V}$ l'ensemble des fonctions de mesurables de $\Omega$ dans $S$.
+    Soient $E$ un ensemble et $M$ une fonction de $E$ dans $\mathcal{V}$.
+    Soit $R\subset E^2$.
+    Soient $(\varepsilon,\delta)\in {\mathbb{R}^+}^2$
+    Alors $M$ satisfait la $(\varepsilon,\delta)$ confidentialité diférentielle si et seulemnt si 
+    \begin{equation*}
+        \forall (e_1,e_2,s)\in E\times E\times \mathcal{S}\quad
+        (e_1,e_2)\in R\implies
+        P(M(e_1)\in s)\leq e^{\varepsilon}P(M(e_2)\in s)+\delta
+    \end{equation*}
+\end{definition}
+En pratique $E$ représente l'ensemble de toutes les bases de données possibles.
+$R$ est une relation telle que $(e_1,e_2)\in R$ si et seulement si $e_1$ et $e_2$ différent d'une donnée.
+$S$ est l'ensemble des modèles possibles.
+$M$ est l'algorithme d'apprentissage qui prend en entrée une basse de donnée et renvoie une variable aléatoire à valeur dans l'espace des modèles $S$.
+Cette définition signifie donc que pour des bases de données de données diférentes d'une ligne, l'algorithme d'apprentissage aura des sorties presques indistinguables l'une de l'autres.
+Le presque étant paramétré par $\varepsilon$ et $\delta$.
+
+\FloatBarrier
+\subsubsection{Risque sur les données d'évaluation}
+Le second risque pour la confidentialité que nous allons évoquer concerne les donnée des utilisateur de modèle d'apprentissage et non plus les données d'entraînement.
+Dans ce cas un utilisateur souhaite évalue une donnée sur le modèle cibel et la question que l'on se pose est : 
+Que ce passe t'il si la prédiction fuite à un adversaire ?
+
+Song et al.~\cite{Song2020Overlearning} introduisent le concept de \emph{sur-apprentissage}\footnote{\textit{Overlearning}}.
+Ce terme désigne un modèle cible qui apprend plus que sa tâche principale.
+Par exemple un modèle servant à inférer si une personne souris dans une image vas aussi apprendre la couleur de peau~\cite{malekzadeh2021honestbutcurious}.
+Ou encore, utiliser un modèle qui prédise l'admission dans un école ou l'obtention d'un pret pour inférer le genre.
+Il s'agit donc d'inférer un attribut sensible en utilisant la prédiction d'un modèle cible qui n'a pas été entraîné pour inférer cet attribut sensible.
+Nous appelerons ce type d'attaque : inférence d'attribut sensible (AIA).
+
+Nous considérerons pour la suite que l'adversaire à uniquement accès à la prédiction du modèle cible et non pas à la donnée d'entrée.
+En effet le modèle cible n'ajoute pas plus d'information concernant l'attribut sensible que n'est contenus dans la donnée d'entrée~\cite{jayaraman2022attribute}.
+
+Une AIA qui cherche à inférer un attribut sensible présente dans le données d'entrée est appelé \emph{inversion de modèle}\footnote{\textit{modèle inversion}}.
+En effet comme l'adversaire cherche a inferer une entrée d'un modèle cible à partir de sa sortie, cette attaque est similaire à l'inversion d'un fonction.
+Fredrikson et al.~\cite{fredrikson2} donnent un exemple marquant en pharmacogenetics :
+La molecule Warfarin entre dans le traitement préventif des crises cardiaques cependant son dosage est complexe car il dépend de chaque patient.
+Ainsi des modèles ont été créés pour prédire le dosage à partire des donnée médicales du patient comme son génotype.
+Fredrikson et al. ont réussi à utiliser la prédiction de ces modèles pour retrouver les donnés médicales démontrant ainsi le risque de privacy inhérant aux sortie de modèles.
+
+Les dévlopements nouveaux que proposent ce manuscrit se concentrerons sur les risque d'inférence liés à des attribut sensibles qui ne sont pas utilisé lors de l'entraînement.
 
-Attacks which violate privacy and confidentiality in ML infer potentially sensitive information from observable information (e.g., model predictions). 
-This leakage of information is a privacy risk if adv learns something about $traindata$ -or the inputs- which would be impossible to learn without access to $targetmodel$. This differentiates between a privacy risk and simple statistical inference~\cite{cormode}. 
-Among the various privacy risks explored in literature pertaining to ML models, attribute inference attacks~\cite{fredrikson2,Mahajan2020DoesLS,yeom,Song2020Overlearning,malekzadeh2021honestbutcurious,MehnazAttInf} infer the specific value of a sensitive attribute for a specific input to ML model given some model observables (e.g., model predictions, parameters, intermediate layerwise outputs) and background information. Based on attack surface being exploited, aia{s} can be categorized into (a) imputation-based attacks and (b) representation-based attacks. 
-
-Let's introduce some notations to guide us in understanding the zoology of those attacks. 
-
-We have a dataset $d:I\rightarrow \mathcal{X}\times\mathcal{S}\times\mathcal{Y}$ containing as column: the features, the sensitive attribute and the ground truth.
-$I$ is a finite set of indices. 
-To access features, sensitive attribute and labels from there indices, we define respectively the following functions:
-\begin{itemize}
-    \item $X:I\rightarrow \mathcal{X},~i\mapsto (d(i))_0$
-    \item $S:I\rightarrow \mathcal{S},~i\mapsto (d(i))_1$
-    \item $Y:I\rightarrow \mathcal{Y},~i\mapsto (d(i))_2$
-\end{itemize}
-Let $(I_0,I_1)$ be a partition of $I$.
-$d$ is split in two datasets $d_0 = d_{{|I_0}}$ and $d_1 = d_{{|I_1}}$ which we call respectively the target dataset and the auxiliary dataset.
-$d_0$ is used to train a machine learning model to infer the ground truth from the features: we call it the target model $targetmodel$.
-
-Regarding attribute inference attack, we differentiate between training time attacks that target $d_0$: the dataset used in training.
-And inference time attack that target data used as input of an already trained target model.
-Our work focuses on the later (see figure \ref{fig:tm2}) but for clear positioning of our contributions, we are going to present both types of attack in this background section.
-
-\noindent\textbf{\underline{Imputation-based attacks}} assume adv has access to non-sensitive attributes in addition to model's predictions and background information (e.g., marginal prior over sensitive attribute and confusion matrix). We review these different imputation-based attacks below:
-
-
-
-\setlength\tabcolsep{3pt}
-\begin{table*}[!htb]
-\caption{Comparison of prior work based on: attack surface exploited (e.g., model predictions ($targetmodel(X(i))$), $X(i)$, $Y(i)$, distribution over $S(i)$ ($P_S$) and confusion matrix between true and predicted output across all training data records ($C(Y(i),targetmodel(X(i)))$), whether $S(i)$ is censored, i.e., included in $traindata$ or inputs, whether they account for class imbalance in $S(i)$, whether adv is active or passive and whether the threat model is blackbox or whitebox. All the attacks assume the knowledge of auxiliary data $auxdata$.}
-\begin{center}
-\footnotesize
-\begin{tabular}{ |c|c|c|c|c|c| }
- \hline
- \textbf{Literature} & \textbf{Attack Vector} & \textbf{$S$ is censored?} & \textbf{Imbalance in $S$?} & \textbf{adv} & \textbf{Threat Model} \\
- \hline
- \multicolumn{6}{|c|}{\textbf{Imputation-based Attacks}}\\
- \hline
- \textbf{Fredrikson et al.}~\cite{fredrikson2} & $X$, $Y$, $targetmodel(X(i))$, \textbf{$P_S$}, $C(Y(i),targetmodel(X(i)))$ & $\checkmark$ & $\times$ &  Passive & Blackbox\\
- \textbf{Yeom et al.}~\cite{yeom} & $X(i)$, $Y(i)$, $targetmodel()$, \textbf{$P_S$} & $\checkmark$  & $\times$ &  Passive & Blackbox\\
-  \textbf{Mehnaz et al.}~\cite{MehnazAttInf} & $X(i)$, $Y(i)$, $targetmodel()$, \textbf{$P_S$}, $C(Y(i),targetmodel(X(i)))$ & $\checkmark$ & $\times$ &  Passive & Blackbox\\
-  \textbf{Jayaraman and Evans}~\cite{jayaraman2022attribute} & $X(i)$, $Y(i)$, $targetmodel()$, \textbf{$P_S$}, $C(Y(i),targetmodel(X(i)))$ & $\times$, $\checkmark$ & $\times$ &  Passive & Whitebox\\
-  \hline
-   \multicolumn{6}{|c|}{\textbf{Representation-based Attacks}}\\
-  \hline
- \textbf{Song et al.}~\cite{Song2020Overlearning} & $targetmodel(X(i))$ & $\times$ & $\times$ &  Passive & Both\\
- \textbf{Mahajan et al.}~\cite{Mahajan2020DoesLS} & $targetmodel(X(i))$ & $\checkmark$ & $\times$ &  Passive & Blackbox\\
- \textbf{Malekzadeh et al.}~\cite{malekzadeh2021honestbutcurious} & $targetmodel(X(i))$ & $\times$ & $\times$ &  Active & Blackbox\\
- \textbf{Our Work} & $targetmodel(X(i))$ & $\times$, $\checkmark$ & $\checkmark$ &  Passive & Blackbox \\
- \hline
-\end{tabular}
-\end{center}
-\label{tab:summary}
-\end{table*}
-
-\label{sec:bck_aia}
-
-\begin{itemize}
-    \item \textbf{Fredrikson et al.~\cite{fredrikson2}} assumes that adv has access to $targetmodel(X(i))$. 
-    For this attack it is required that $X$ can be written $X(i) = (\cdots,S(i),\cdots)$.
-    We will refer to this case as "\textit{S is in the input}".
-    Fredrikson et al. attack generates an input with different possible values of the sensitive attribute
-    It then chooses the most likely value based on $targetmodel(X(i))$.
-
-    \item \noindent\textbf{Yeom et al.~\cite{yeom}} assumes a distribution $P_S$ over $S$ which is used to estimate the value of $S$ for an arbitrary data record. They propose three different variants of AS based on assumptions on $P_S$: Attack 1 leverages membership oracle to determine the value of $S(i)$ and Attack 2 and 3 assume different types of distributions over $S$.
-    For this attack to work, $S$ is in the input and the data points being attacked belong to the target dataset 
-    
-    \item \textbf{Mehnaz et al.~\cite{MehnazAttInf}} improves upon Fredrikson et al.~\cite{fredrikson1,fredrikson2} by exploiting $targetmodel\circ X$ and $X$, with $S$ in the input. The attack relies on the intuition that $targetmodel$'s output confidence is higher when the input has the correct value of $S$ as $targetmodel$ encountered the target record with that attribute during training. Their attack involves generating multiple instances of input with different values of $S(i)$ (similar to Fredrikson et al.~\cite{fredrikson1,fredrikson2}) and identifying the most likely value of $S$.
-\end{itemize}
-
-An appropriate baseline to identify whether such attacks are indeed a privacy risk is to use data imputation, i.e., train an ML model to infer value of missing attribute from other non-sensitive attributes without $targetmodel(X(i))$~\cite{jayaraman2022attribute}. Jayaraman and Evans~\cite{jayaraman2022attribute} find that existing blackbox imputation-based attacks~\cite{yeom,fredrikson2,MehnazAttInf} do not perform any better than data imputation. In other words, the perceived privacy risk is actually stemming from statistical inference and hence not an actual privacy risk.
-
-To address this, Jayaraman and Evans~\cite{jayaraman2022attribute} propose a whitebox aia which outperforms prior blackbox attacks as well as data imputation in the setting where there is limited knowledge of data for adv. However, since the attack is in a whitebox setting, we omit a detailed description of the attack. All these attacks require that: 
-
-\begin{itemize}
-    \item $S$ is in the input data records which is not always the case in realistic settings,
-    \item $X(i)$ being attacked belong to the target dataset.
-\end{itemize}
-
-\noindent\textbf{\underline{Representation-based attacks}} exploit the distinguishable intermediate layer outputs or predictions for different values of sensitive attributes~\cite{Song2020Overlearning,Mahajan2020DoesLS,malekzadeh2021honestbutcurious}. For instance, the distribution of $targetmodel\circ X$ for \textit{males} is different from the output prediction distribution for \textit{females}. We describe the existing attacks of this category below:
-
-\begin{itemize}
-\item \textbf{Song et al.~\cite{Song2020Overlearning} / Mahajan et al.~\cite{Mahajan2020DoesLS}} assume that $S$ is not in the input. adv only observes $targetmodel\circ X$. adv trains an ML attack model $ackmodel$ to map the output predictions $targetmodel(X(i))$ to $S(i)$.
-In other words, the statistic $\hat{S}$ used to infer $S$ is of the form: $ \hat{S} = 1_{[0.5,1]}\circ ackmodel\circ targetmodel\circ X$, where $attackmodel: [0,1]\rightarrow[0,1]$.
-
-
-\item \textbf{Malekzadeh et al.~\cite{malekzadeh2021honestbutcurious}} considers the setting where adv trains $targetmodel$ with a special loss function to explicitly encode information about $S(i)$ in $targetmodel(X(i))$.
-It makes it easier to extract the sensitive attribute during inference. In this setting, the model builder is malicious and actively introduces a ``backdoor''.
-\end{itemize}
-
-Our work focuses on representation-based aia in a blackbox setting at inference time. We focus on Song et al.~\cite{Song2020Overlearning} and Mahajan et al.~\cite{Mahajan2020DoesLS} as our baselines. 
-These attacks do not account for class imbalance in sensitive attribute commonly present in data from real-world applications which could effect adv's attack success~\cite{classIMb1,classIMb2}.
-In our evaluation, we consider an aia using an adaptive threshold which outperforms these baselines attacks (Section~\ref{sec:evalAIA}).
-Malekzadeh et al.~\cite{malekzadeh2021honestbutcurious} has a different threat model where adv explicitly modifies the training to enhance the leakage of $S$. 
-We do not assume such access to $targetmodel$ in a blackbox setting. 
-In addition, these attacks did not take into consideration the possibility to infer the sensitive attribute solely from the hard labels.
-We summarize relevant prior work in Table~\ref{tab:summary}.
 
diff --git a/background/eq.tex b/background/eq.tex
index b756361..a53f479 100644
--- a/background/eq.tex
+++ b/background/eq.tex
@@ -97,7 +97,48 @@ Ces définitions peuvent être imposé au modèle de trois manières:
         Le modèle est biaisé mais sa sortie est filtrée.
 \end{enumerate}
 Comme nous nous intéressons au interaction entre équitée et confidentialité, le Chapitre~\ref{sec:aia} s'inscrit dans la lignée de travaux précédent qui se concentrent sur les méchanismes entraitements~\cite{chang2021privacy}.
+Nous allons en présenter deux que nous allons utiliser dans la suite du manuscrit.
 
-\paragraph{Déscente de gradient exponentiée}
+\paragraph{Déscente de gradient exponentié}
+L'aproche par réduction pour une classification équitable\footnote{\textit{Reductions approaches to fair classification}} traduit une définition d'équité en termé de contraintes d'inégalités~\cite{reductions}.
+Par exemple la partié démographique peut se reformuler de la manière suivante
+\begin{equation*}
+    \left\{
+        \begin{matrix}
+            &E(f\circ X\mid S=0)-E(f\circ X)\leq\epsilon_0\\
+            \text{et}&\\
+            &-E(f\circ X\mid S=0)+E(f\circ X)\leq\epsilon_1\\
+        \end{matrix}
+        \right.
+\end{equation*}
+Où $\epsilon_0$ et $\epsilon_1$ ont été rajouté pour relaxer la contrainte permettant de contrôler le compromis entre utilité en confidentialité.
+Ensuite ces contraintes sont utilisés avec le problème de minimisation sous la forme d'une lagrangien comme nous l'avons vu à la Section~\ref{sec:background-opti-sous}.
+
+Pour trouver le point selle Agarwal et al. utilisent en algorithme qui produit un classifieur stochastique\footnote{randomized classifieur}.
+C'est un classifieur particulier qui n'est pas déterministe.
+Lors de l'apprentissage, plusieurs solutions approchant le point selle sont trouvé qui correspondent à plusieur sous-classifieurs.
+Ensuite pour chaque prédiction un choix aléatoire est réalisé pour sélectione l'un des sous-classifieur qui sera évalué sur la donnée d'entré.
+
+\paragraph{Rééquilibrage adversariel}\footnote{\textit{Adversarial debiasing}}
+Cette méthode prend le problème sous un tout autre angle~\cite{10.1145/3278721.3278779}.
+Au lieu d'integrer les contraintes d'équitée lors de l'apprantissage, elle utilise l'idée suivante : 
+La partié démographique signifie que l'attribut sensible est indépendant de la sortie, donc si il est impossible pour un adversaire de prédire l'attribut sensible à partir du logit, le modèle doit satisfaire cette définition.
+Cette une remarque très juste que nous allons étudié en détail et démontrer dans les Chapitres~\ref{sec:fini} et~\ref{sec:aia}.
+
+La méthode de Zhan et al. consiste donc utiliser deux réseaux de neuronnes.
+L'un infére la tâche principle, l'autre utilise le logit du premier pour inférer l'attribut sensible nous l'appelons adversaire.
+Ces deux classifieur sont entraîné simultanément dans un contexte adversariel.
+Cela signifi que la fonction de cout est de la forme 
+\begin{equation}
+    \label{eq:background-ml-adv}
+    C(x) = F(x) - sA(x)
+\end{equation}
+Où $F$ est le coût du classifieur principale et $A$ celui de l'adversaire.
+Nous voyons que minimiser $C$ à tendence à minimiser $F$ et maximiser $A$ ce qui signifie trouver les paramètres du classifieur de la tâche principle qui vas réaliser une bonne classification tout en empêchant l'adversaire d'inférer l'attribut sensible.
+L'avantage de cette méthode par rapport aux multiplicateurs de Lagrange est que ici on protège directement le logit au lieu de la prédiction ce qui est plus générale.
+Cela serai impossible et génererai une quantité infinie (non-dénombrable) de contraintes si on devais les écrire sous une forme acceptable pour un lagrangien.
+
+Le principale désantage de cette methode est dans le paramètre $s$ de l'Equation~\ref{eq:background-ml-adv}.
+Ce paramètre sert à avoir un bon équilibre entre la tâche principle et contrer l'adversaire. 
+Cependant, comme Zhang et al. le précise, il est très dificile de le trouver et rentre dans la catégorire de l'optimisation des hyperparamètre des réseaux de neuronnes.
 
-\paragraph{Rééquilibrage adversariel}
diff --git a/background/main.tex b/background/main.tex
index e386396..d5a1ce0 100644
--- a/background/main.tex
+++ b/background/main.tex
@@ -17,14 +17,18 @@ a & b & a\iff b & a\implies b & a\wedge b & a\vee b & \neg a\\
 1 & 1 & 1 & 1 & 1 & 1 & 0\\
 \end{matrix}
 \end{equation}
+
+\FloatBarrier
 \subsection{Ensembles et fonctions}
     \label{sec:background-set}
     \input{background/set}
 
+\FloatBarrier
 \subsection{Algèbre linéaire}
     \label{sec:background-evr}
     \input{background/alg}
 
+\FloatBarrier
 \subsection{Mesurer le hasard pour prédire et inférer}
     \label{sec:background-proba}
     \input{background/proba}
@@ -32,27 +36,27 @@ a & b & a\iff b & a\implies b & a\wedge b & a\vee b & \neg a\\
     %\subsection{Probabilitées}
     %\subsection{Statistiques}
 
+\FloatBarrier
 \subsection{Calcul différentiel}
     \label{sec:background-dif}
     \input{background/dif}
 
+\FloatBarrier
 \subsection{Optimisation}
     \label{sec:background-opti}
     \input{background/opti}
 
+\FloatBarrier
 \section{Apprentissage automatique}
     \label{sec:background-ml}
     \input{background/ml}
 
-\subsection{Equité}
-    \label{sec:background-eq}
-    \input{background/eq}
-    %\subsection{Différentes notions d'équité}
-
-\subsection{Confidentialité}
-    \label{sec:background-conf}
-    \input{background/conf}
-    %\subsection{Mitiger l'inéquitée}
-        %\subsubsection{Preprocessing}
-       % \subsubsection{Inprocessing}
-       %\subsubsection{Postprocessing}
+    \FloatBarrier
+    \subsection{Equité}
+        \label{sec:background-eq}
+        \input{background/eq}
+
+    \FloatBarrier
+    \subsection{Confidentialité}
+        \label{sec:background-conf}
+        \input{background/conf}
diff --git a/background/opti.tex b/background/opti.tex
index 03d01a6..0472d26 100644
--- a/background/opti.tex
+++ b/background/opti.tex
@@ -6,26 +6,32 @@ Cela permet l'entraînement de modèle d'apprantissage automatique à l'aide d'u
 Le second problème reprend le premier mais y ajoute des contraintes.
 C'est à dire, comme minimise-t'on le coût tout en garantissant certaines conditions ?
 
-\subsubsection{Optimisation sant contrainte : Descente de gradient}
+\subsubsection{Optimisation sans contrainte : Descente de gradient}
 \label{sec:background-opti-sgd}
 Nous appellons fonctionelles les fonctions $\mathbb{R}^n$ dans $\mathbb{R}$.
 Soit $J$ une fonctionelle convexe, nous cherchons à trouver $x\in\mathbb{R}$ tel que $J(x) = \text{inf}\{J(t)\mid t\in\mathbb{R}\}$.
 Pour simplifier cette rapide présentation, nous supposerons que $J$ à toujours les conditions de régularité (diférentiabilié) suffisante pour les opérations que nous appliquerons.
 Pour trouver $x$ qui minimise $J$ une des méthode les plus utilisé en apprentissage automatique est la descente de gradient.
-Il s'agit de construire une suite $(x_k)_{k\in\mathbb{N}}$ telle que $J(x_k)$ soit strictement décroissante ($\forall k\in\mathbb{N}~J(x_{k+1})<J(x_k)$).
+Il s'agit de construire une suite $(x_k)_{k\in\mathbb{N}}$ telle que à chauqe étape la direction de descente soit optimale.
+L'idée pour arriver à cela et de considérer une approximation de l'accroissement de $J$ en utiliant la définition du gradient.
 
-Pour cela, nous remarquons  
-\begin{align*}
-    J(x_k+h) = J(x_k)+\langle \nabla J(x_k),h\rangle + ||h||\varepsilon(h)\\
-    \iff J(x_k+h) - J(x_k) = \langle \nabla J(x_k),h\rangle + ||h||\varepsilon(h)
-\end{align*}
-Et donc un considérant la partie principale
+On cherche $h$  tel que $||h||=1$ et $J(x_k+h)$ soit minimal.
+D'après la définition du gradient
 \begin{equation*}
-    |J(x_k+h) - J(x_k)|\leq ||\nabla J(x_k)||||h||
+    J(x_k+h) = J(x_k) + \langle\nabla J(x_k),h\rangle + ||h||\epsilon(h)
 \end{equation*}
-D'éprès l'inégalitée de Cauchy-Schwartz. 
-L'égalité est obtenu si et seulment si il existe $l_k$ tel que
-$h=l_k\nabla J(x_k)$.
+On cherche alors à résoudre (*) : $min_{||h||=1}\langle\nabla J(x_k),h\rangle$.
+D'après l'inégalite de Cauchy-Schwartz 
+\begin{equation*}
+\forall h~||h||=1\implies~|\langle\nabla J(x_k),h\rangle |\leq ||\nabla J(x_k)||
+\end{equation*}
+Et aussi
+\begin{equation*}
+    \forall h~||h||=1\implies~\left(|\langle\nabla J(x_k),h\rangle | =||\nabla J(x_k)||
+    \iff h = \pm \frac{\nabla J(x_k)}{||\nabla J(x_k)||}\right)
+\end{equation*}
+$h=-\frac{\nabla J(x_k)}{||\nabla J(x_k)||}$ est donc solution de (*) pour que $\langle\nabla J(x_k),h\rangle$ soit négatif.
+
 Ainsi la méthode de déscente de gradient est définit par la suite
 $x_{k+1}=x_k-l_k\nabla J(x_k)$.
 $l_k$ est appelé le pas.
@@ -54,13 +60,36 @@ Avec une illustration de la convergence de l'écart entre $J(x_k)$ et le minimum
 
 
 \subsubsection{Optimisation sous contraintes : multiplicateurs de Lagrange}
+\label{ref:background-opti-sous}
 Pour expliquer ce qu'est l'optimisation sous contraintes, represnons les mots de Philipe G. Ciarlet :
 \textquote{On s'interesse au problème suivant : trouver des conditions \emph{nécessaires}, et des conditions \emph{suffisantes}, pour qu'un point d'un ensemble $U$ soit un extremum relatif de la restriction à l'ensemble $U$ d'une fonction $J$ définie sur un ensemble "plus grands". [...]
 Un premier exemple est celui des \emph{extremums relatifs liés}, où l'ensemble $U$ est de la forme 
 \begin{equation*}
-    U=\{v\in Q \mid \forall i\in m-1~\psi_i(v)=0\}
+    U=\{v\in V \mid \forall i\in m-1~\varphi_i(v)\leq 0\}
 \end{equation*}
 }
-Pour une présentation plus complète des multiplicateurs de Lagrange voir la Section 7.2 de~\cite{ciarlet}
+
+On introduit le Lagrangien de ce problèmes par la formule suivante:
+\begin{equation*}
+    L:\left\{
+        \begin{matrix}
+            V\times\mathbb{R}^m_+\\
+            (v,\mu)\mapsto J(v)+\sum_{i=0}^{m-1}\mu_i\varphi(v)
+        \end{matrix}
+        \right.
+\end{equation*}
+
+Pour respecter les contrainte du problèmes, la méthode consiste à chercher un point selle de $L$,
+c'est-à-dire, un point $(u,\lambda)\in V\times \mathbb{R}^m_+$ tel que 
+\begin{equation*}
+    sup L(\square,\lambda)=L(u,\lambda)\wedge inf L(u,\square)=L(u,\lambda)
+\end{equation*}
+
+$u$ est alors solution du problème.
+Il est donc suffisant de connaitre $\lambda$ appelé \emph{multiplicateurs de Lagrange} pour pouvoir trouver $u$ en se ramensant au cas sans contraintes de la section précédente.
+Trouver $\lambda$ s'apelle le \emph{problème dual} en contrepartie de la recheche de $u$ qui est le \emph{problème primal}.
+Le problème dual bénéficie du fait que ces contraintes sont plus simples car il sagi uniquement de la positivité des multiplicateur de Lagrange.
+Le problème dual s'écrit donc $sup (inf_{v\in V}L(v,\square))$.
+Pour une présentation plus complète des multiplicateurs de Lagrange et de la dualité voir les Sections 7.2 et 9.3 du Livre de Ciarlet~\cite{ciarlet}
 
 
diff --git a/background/proba.tex b/background/proba.tex
index 1cfe29e..55bca7a 100644
--- a/background/proba.tex
+++ b/background/proba.tex
@@ -57,20 +57,24 @@ Nous definisson la mesure image de $f$ par $d$, que nous notons $d_f$, par l'exp
 \end{equation}
 
 \begin{definition}{Intégrale}
-    Soient $(E,\mathcal{E},\mu)$ et $(F,\mathcal{F},\nu)$ un espace mesuré.
+    Soient $(E,\mathcal{E},\mu)$ un espace mesuré.
     Pour une fonction $f=\sum_{i\in I}\alpha_i 1_{A_i}$, nous dirons étagé, 
-    Avec $\{A_i\mid i\in I\} \subset \mathcal{F}$.
-    Alors $\int_E f d\nu= \sum_{i\in I}\alpha_i \nu(A_i)$.
+    Avec $\{A_i\mid i\in I\} \subset \mathcal{E}$ et $\alpha_i\in\mathbb{R}^+$.
+    alors $\int_E f d\mu= \sum_{i\in I}\alpha_i \mu(A_i)$.
 
-    Soit $g$ un fonction mesurable.
-    Alors il existe une suite $\{(f_n)\}_{n\in\mathbb{N}}$ de fonctions étagés telle que $lim_{n\rightarrow +\infty} f_n = g$.
-    Voir la Définition~\ref{def:background-dif-lim} pour une définition de la limite.
-    On définit alors 
+    Soit $g$ un fonction mesurable de $(E,\mathcal{E},\mu)$ dans $\mathbb{R}^+$, alors
     \begin{equation*}
-        \int_{E}gd\nu = lim_{n\rightarrow +\infty}\int_{E}f_n d\nu
+        \int_{E}gd\mu = sup\left\{\int_E fd\mu\mid f~\text{est étagé}\wedge f\leq g\right\}
+    \end{equation*}
+
+    Enfin dans le cas générale de $h$ une fonction mesurable de $(E,\mathcal{E},\mu)$ dans $\mathbb{R}$, alors
+    si $\int_E |h|d\mu\in\mathbb{R}$ on pose
+    \begin{equation*}
+        \int_E hd\mu = \int_E max(h,0)d\mu - \int_E max(-h,0)d\mu
     \end{equation*}
 \end{definition}
 
+
 Dans le cas particulier où $d(A) = 1$, nous appelons $d$ une mesure de probabilité.
  $(A,\mathcal{A},d)$ est alors un espace probailisé et les fonctions mesurables sur cet espace sont appelés variables aléatoires.
 Le loi de probabilité d'une variable aléatoire $f$ sur $(X,\mathcal{X})$ est la mesure image de $f$ sur $d$.
diff --git a/biblio.bib b/biblio.bib
index 3eddd2b..697b330 100644
--- a/biblio.bib
+++ b/biblio.bib
@@ -1,4 +1,90 @@
 ######################""
+@article{breiman2001random,
+  title={Random forests},
+  author={Breiman, Leo},
+  journal={Machine learning},
+  volume={45},
+  pages={5--32},
+  year={2001},
+  publisher={Springer}
+}
+
+@article{shwartz2022tabular,
+  title={Tabular data: Deep learning is not all you need},
+  author={Shwartz-Ziv, Ravid and Armon, Amitai},
+  journal={Information Fusion},
+  volume={81},
+  pages={84--90},
+  year={2022},
+  publisher={Elsevier}
+}
+@article{grinsztajn2022tree,
+  title={Why do tree-based models still outperform deep learning on typical tabular data?},
+  author={Grinsztajn, L{\'e}o and Oyallon, Edouard and Varoquaux, Ga{\"e}l},
+  journal={Advances in neural information processing systems},
+  volume={35},
+  pages={507--520},
+  year={2022}
+}
+
+@ARTICLE{1688199,
+
+  author={Polikar, R.},
+
+  journal={IEEE Circuits and Systems Magazine}, 
+
+  title={Ensemble based systems in decision making}, 
+
+  year={2006},
+
+  volume={6},
+
+  number={3},
+
+  pages={21-45},
+
+  doi={10.1109/MCAS.2006.1688199}}
+
+@INPROCEEDINGS{1626170,
+
+  author={Huang, Y.S. and Suen, C.Y.},
+
+  booktitle={Proceedings of IEEE Conference on Computer Vision and Pattern Recognition}, 
+
+  title={The behavior-knowledge space method for combination of multiple classifiers}, 
+
+  year={1993},
+
+  volume={},
+
+  number={},
+
+  pages={347-352},
+
+  doi={10.1109/CVPR.1993.1626170}}
+
+
+
+@article{hawkins2004problem,
+  title={The problem of overfitting},
+  author={Hawkins, Douglas M},
+  journal={Journal of chemical information and computer sciences},
+  volume={44},
+  number={1},
+  pages={1--12},
+  year={2004},
+  publisher={ACS Publications}
+}
+@inproceedings{ying2019overview,
+  title={An overview of overfitting and its solutions},
+  author={Ying, Xue},
+  booktitle={Journal of physics: Conference series},
+  volume={1168},
+  pages={022022},
+  year={2019},
+  organization={IOP Publishing}
+}
+
 @misc{stateth,
     titre={Statistiques ethniques},
     howpublished={\url{https://www.insee.fr/fr/information/2108548}},
author	Jan Aalmoes <jan.aalmoes@inria.fr>	2024-09-23 20:54:42 +0200
committer	Jan Aalmoes <jan.aalmoes@inria.fr>	2024-09-23 20:54:42 +0200
commit	5362d389988afb2750d27e7e6c5d401571dfba6e (patch)
tree	381d52745fb20e54cdb16cd140e5ffea7d0ac76d
parent	caa9990c96141450f62a8076f560761f517dc884 (diff)