Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tencrop test images & imbalanced data processing? #10

Open
han-liu opened this issue Apr 27, 2018 · 5 comments
Open

Tencrop test images & imbalanced data processing? #10

han-liu opened this issue Apr 27, 2018 · 5 comments

Comments

@han-liu
Copy link

han-liu commented Apr 27, 2018

Hi zoogzog,

I'm wondering why to apply tencrop technique on testing images. I thought data augmentation techniques should only be applied on training set in order to add diversity of training images. At the same time, test images should be kept unchanged so the testing results can be compared with others.

Also another question, do you think it is necessary to pre-process the imbalanced data? I noticed there is a HUGE imbalance between the sample number of hernia and other diseases (~200 images vs 10000 e.g. infiltration), and thus different testing set would result in really different aucroc on at least Hernia.

Thanks!

@zoogzog
Copy link
Owner

zoogzog commented May 1, 2018

The ten-crop validation technique is based on this implementation. The authors achieved better accuracy than the one presented in the original paper using ten-crop for validation.

The chest in original X-ray images is not aligned the same way at all images. Ten-crop approach attempts to address this issue.

In the original paper the authors addressed the problem of the imbalanced data for binary classification by introducing weights into the loss function (for Pneumonia). I did similar tests for Nodule and Mass pathologies. Indeed using weights in loss function increases accuracy. So, I think to get good accuracy it is necessary to balance the data somehow (either use weights in loss function, or train with oversampling).

@han-liu
Copy link
Author

han-liu commented May 1, 2018

Thanks for reply zoogzog! For the first question, I'm not very sure how tencrop technique works in pytorch since I'm using Keras. But if the pytorch built-in tencrop will generate ten images from one, I think it should not be applied on test set because the sample numbers in test set should never be changed e.g. now we have 100 testing images, generated from 10 images in original testing set. If the tencrop generates one randomly cropped image from each testing image, then how do you make sure the cropped image has the correct orientation? For the second question, have you tried using weight balancing on the 14-class problem? I actually tried adding weight balancing in the loss function of 14 classes problem but it turned out the mean aucroc decreased.. or do you know why the CheXnet paper did not use weighting on the 14-class loss function? Thanks a lot!

@zoogzog
Copy link
Owner

zoogzog commented May 16, 2018

Indeed, ten-crop will generate 10 images and it was my concern as well that it is not an adequate strategy for performing testing.

I have not tried weighting the loss function for 14 classes problem, and I wonder why the authors did not use weighting for 14 classes.
Instead of weighting, you could try training with oversampling.

@Stexan
Copy link

Stexan commented May 17, 2018

@zoogzog what would training with oversampling look like? data augment the classes with few training data?

@InamTaj
Copy link

InamTaj commented Mar 26, 2020

I think yes, @Stexan

data augment the classes with few training data?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants
@Stexan @InamTaj @zoogzog @han-liu and others