Support

Comparison of data augmentation techniques

Fall 2020

Data Scientist

As a Z by HP Data Science Global Ambassador, Takumi Okoshi's content is sponsored and he was provided with HP products.

 

Hello, I'm Takuoko, a Kaggle Grandmaster.

 

In this post, I would like to compare data augmentation techniques, for both image classification and testing. Data augmentation is a powerful technique in CV competition.

I have compared the techniques used for image classification, which is the most standard in CV competition. 

 

As a Z by HP Data Science Global Ambassador, I have been provided with high-powered HP workstations.

#developmentenvironment 

I am working on a CV competition with the aforementioned powerful systems. Compared to the time when I used V100, I now have systems that can run freely at any time and obtain higher speeds, so the number of experiments that I can run has increased greatly compared to before, and I can also run comparative tests of various papers.

Environments such as PyTorch and CUDA are pre-installed in the systems, so there was no need for me to build the environment.

 

# Comparison images of data augmentation techniques

 

I would like to compare two images of CIFAR-100 by mixing them. Some of the figures are cited from papers.

## Mixup

 

[paper with code](https://paperswithcode.com/paper/mixup-beyond-empirical-risk-minimization)

 

A technique of mixing two images with lam * img1 + (1 - lam) * img2.

## Manifold Mixup

 

[paper with code](https://paperswithcode.com/paper/manifold-mixup-better-representations-by)

 

Example of mixup in the middle layer.

 

## CutMix

 

[paper with code](https://paperswithcode.com/paper/cutmix-regularization-strategy-to-train)

 

A technique in which one image is cut out with a bbox of a certain size and pasted into the other image.

 

There are two methods: corresponding paste which pastes at the same position as the cut-out position, and random paste which pastes at random positions.

## PatchUp

 

[paper with code](https://paperswithcode.com/paper/patchup-a-regularization-technique-for)

 

A method of running CutMix in the middle layer.

 

## ResizeMix

 

[paper with code](https://paperswithcode.com/paper/resizemix-mixing-data-with-preserved-object)

 

A technique in which one image is resized and pasted into the other.

## fmix

 

[paper with code](https://paperswithcode.com/paper/understanding-and-enhancing-mixed-sample-data)

 

Compared to CutMix, a mask can be generated and mixed regardless of the shape (does not have to be square).

Reduce background image noise by using CAM to weight the label after mixing. Figure is cited from the paper.

 

## PuzzleMix

 

[paper with code](https://paperswithcode.com/paper/puzzle-mix-exploiting-saliency-and-local-1)

 

Reduces background noise by overlapping important parts. Figure is cited from the paper.

# Comparative testing of data augmentation techniques

Test Settings 

Dataset:Kaggle’s [Cassava Leaf Disease Classification](https://www.kaggle.com/c/cassava-leaf-disease-classification)

 

Model:resnet34d

Loss:CrossEntropyLoss

Optimizer:Adam+Lookahead

Image Size:256 

Batch Size:64

Epochs:20

Results

Technic: 5fold CV

CutMix random paste: 0.8708

CutMix corresponding paste: 0.8694

PatchUp: 0.8694 

ManifoldMixup: 0.8692

ResizeMix: 0.8686

Cutmix corresponding paste + Mixup: 0.868

Fmix: 0.8675 

PuzzleMix: 0.8662

Mixup: 0.8661

Summary

As noted in ResizeMix's paper, random paste was more accurate for CutMix. ResizeMix, on the other hand, was less accurate than CutMix, which is a different result from the paper.

 

Fmix's paper also mentioned that CutMix + Mixup and Fmix + Mixup were more accurate than either method alone, but this could not be replicated.

 

As for the methods, I felt that random paste and ResizeMix needed careful consideration to be applied, since accuracy would be low in image sets in which the position is fixed in the whole image sets, such as medical images. Techniques such as PuzzleMix and SnapMix are likely to be more effective for tasks in which the subtle points hold importance.

Further hopes

I will continue to use the systems supported for Z by HP Data Science Global Ambassadors to compare and test methods of various papers in CV competitions.

Have a Question?
Contact Sales Support. 

Follow Z by HP on Social Media

Instagram

X

YouTube

LinkedIn

Facebook

Monday - Friday

7:00am - 7:30pm (CST) 

Enterprise Sales Support

1-866-625-0242 

Small Business Sales Support

1-866-625-0761

Monday - Friday

7:00am - 7:00pm (CST) 

Government Sales Support 

Federal

1-800-727-5472

State and local 

1-800-727-5472

Go to Site 

Monday - Friday

7:00am - 7:00pm (CST) 

Education Sales Support 

K-12 Education

1-800-727-5472

Higher Education

1-800-727-5472

Go to Site  

Monday - Sunday

9:00am - 11:00pm (CST) 

Chat with a Z by HP Live Expert

Click on the Chat to Start

 Need Support for Your Z Workstation? 

Go to Support Page

Disclaimers

  1. Product may differ from images depicted.

     

    The information contained herein is subject to change without notice. The only warranties for HP products and services are set forth in the express warranty statements accompanying such products and services. Nothing herein should be construed as constituting an additional warranty. HP shall not be liable for technical or editorial errors or omissions contained herein.