Adversarial validation as density ratio estimation

Adversarial validation
Density ratio
Relationship between adversarial Validation and density ratio
Conclusion

Adversarial validation

Adversarial validation is a technique mainly used in Kaggle.

When the distribution of train set and that of public test set differs, to make validation set randomly from train set leads to low correlation between public leaderboard and local validation. Adversarial validation chooses data from train set that has high density in public test set, and makes nice validation set that has high correlation to public test set.

Steps of adversarial validation are like following :

Give pseudo negative labels -1 to train set labels and give pseudo positive labels +1 to test set.
Train any probabilistic binary classifier which discriminates train set and test set using cross-validation.
Infer train set using above classifier, employ top-N data which have high score for class +1 as validation set.

(Ref)

Adversarial validation, part one - FastML

Density ratio

Here, "density ratio" : $r(x)$ is introduced.

density ratio $\displaystyle{ r(x) = \frac{q(x)}{p(x)} }$

Density ratio is rewritten using Bayes' rule as :

$\displaystyle{ r(x) = \frac{q(x)}{p(x)} = \frac{p (x | y = +1)} {p (x | y = -1) } = \frac {p(y=-1)p (x , y = +1)} {p(y=+1)p (x , y = -1)} = \frac {p(y=-1)p (y = +1|x)} {p(y=+1)p (y = -1|x)} }$

$p (y = -1|x)$ and $p (y = +1|x)$ can be approximated by a binary classifier that discriminates train set and test set. Also, $p(y=-1)$ and $p(y=+1)$ are constant value, and can be approximated by the ratio between the number of train data and that of test data.

Thus, density ratio is :

density ratio $\displaystyle{ r(x) =\frac{N}{M} \frac{f(x)}{1-f(x)} }$

where N and M is the number of train data and that of test data. $f(x)$ is the probability for class +1 (target set) of data $x$ that is outputted by the classifier.

Importantly, density ratio estimation can be boiled down to a binary classification problem. That is the same process as the adversarial validation. (Remind that we make a binary classifier between train set and test set in adversarial validation!)

(Introduction to density ratio estimation) Density Ratio Estimation for KL Divergence Minimization between Implicit Distributions | Louis Tiao

Density ratio $r(x)$ is the monotonically increasing function w.r.t. $f(x)$ . For simplicity, $N=M$ is assumed in the following discussion.

Shape of density ratio is like :

f:id:daiki-yosky:20191028082734p:plain — Horizontal axis:f(x), Vertical axis:density ratio

Relationship between adversarial Validation and density ratio

Shown above, adversarial validation chooses data from train data that have high density ratio value.

Data which have $f(x) > 0.5$ are high density in test data and low density in train set. Such data is so useful for validation because they are rare in train set, while frequently appeared in test data.

Data which have around $f(x) = 0.5$ are difficult to classify into train set and test set. These are also useful for validation data.

Data which have $0.5 > f(x)$ are high density in train data and low density in test set. It is dangerous to use this kind of data for validation data because they are rare in test set.

Conclusion

In this article, I introduce the relationship between adversarial validation technique and density ratio estimation. Adversarial validation is theoretically equivalent to density ratio estimation between train set and test set.