Downsampling

Downsampling is a technique for reducing the number of samples in a dataset, most commonly used to fix class imbalances—situations where one category is heavily overrepresented relative to another—before training a model.

For example, in a dataset with 90 percent cat images and 10 percent dog images, a model trained on that data will quickly learn that predicting “cat” almost every time produces high accuracy, but it will be nearly useless at identifying dogs. Downsampling addresses this by removing samples from the majority class until both classes are more evenly represented.

The benefits of downsampling included faster training and less risk of overfitting compared to upsampling. The downsides are data loss, since removing samples from the majority class can strip out useful information, and potential bias if the remaining samples aren't representative of the original distribution.

Share