Hyperparameter Selection under Localized Label Noise via Corrupt Validation
Existing research on label noise often focuses on simple uniform or class-conditional noise. However, in many real-world settings, label noise is often somewhat systematic rather than completely random. Thus, we ﬁrst propose a novel label noise model called Localized Label Noise (LLN) that corrupts the labels in small local regions and is signiﬁcantly more general than either uniform or class-conditional label noise. LLN is based on a k-nearest neighbors corruption algorithm that corrupts all neighbors to the same wrong label and reduces to a class-conditional label noise if k = 1. Given this more powerful model of label noise, we propose an empirical hyperparameter selection method under LLN that selects better hyperparameters than traditional selection strategies, such as cross validation, by synthetically corrupting the training labels while leaving the test labels unmodiﬁed. This method can provide an approximate and more robust validation for hyperparameter selection. We design several label corruption experiments on both synthetic and real-world data to demonstrate that our proposed hyperparameter selection yields better estimates than standard methods.