Issues of loading celeba dataset - page 512 #54

huipingcao · 2022-04-14T22:33:05Z

huipingcao
Apr 14, 2022

Hello,

Thank you for the very informative book and repository.

I am trying the workaround to load the celeba dataset following what is provided in the PyTorch book page 512
"you can download the files from the official CelebA website manually (https://mmlab. ie.cuhk.edu.hk/projects/CelebA.html) or use our download link: https://drive. google.com/file/d/1m8-EBPgi5MRubrm6iQjafK2QMHDBMSfJ/view?usp=sharing. If you use our download link, it will download a celeba.zip file, which you need to unpack in the current directory where you are running the code."

However, I ran into the same following error when I used either my local environment or google colab
"RuntimeError: Dataset not found or corrupted. You can use download=True to download it"

In the local environment, I put the celeba in the same folder as my python code (e.g., test.py and celeba are all directly in folder A). I tried both relative path and absolute path. Neither worked, the error shows /site-packages/torchvision/datasets/celeba.py", line 83, in init
raise RuntimeError("Dataset not found or corrupted. You can use download=True to download it")
RuntimeError: Dataset not found or corrupted. You can use download=True to download it

Using google Colab, I set the image path to be the absolute path or relative path, neither worked. It shows
/usr/local/lib/python3.7/dist-packages/torchvision/datasets/celeba.py in init(self, root, split, target_type, transform, target_transform, download)
81
82 if not self._check_integrity():
---> 83 raise RuntimeError('Dataset not found or corrupted.' +
84 ' You can use download=True to download it')
85

RuntimeError: Dataset not found or corrupted. You can use download=True to download it

I tried to read the celeba.py, it seems that the dataset cannot pass the check_integrity function.
Any insight on solving this issue?

Thanks!

Answered by rasbt

Apr 25, 2024

Sorry everyone for the hassle here. It's a bit frustrating that it doesn't work out of the box via torchvision anymore.

This was also shared as an issue here: pytorch/vision#1920

What I did is I downloaded the files from here: https://drive.google.com/drive/folders/0B7EVK8r0v71pWEZsZE9oNnFzTm8?resourcekey=0-5BR16BdXnb8hVj6CNHKzLg

For simplicity, you can also use my link here where I already prepared the directory structure: https://drive.google.com/file/d/1m8-EBPgi5MRubrm6iQjafK2QMHDBMSfJ/view?usp=share_link

Download that zip file and place it in the celeba folder. Then unzip img_align_celeba.zip. And it should work:

View full answer

bkemmer · 2022-04-30T23:43:03Z

bkemmer
Apr 30, 2022

Hi @huipingcao I believe the reason that it's saying "Dataset corrupted" is because the md5 hash of the committed files in the folder has changed. If you take a look at the source code of the celeba link and run (in your terminal): md5sum <filename> you'll see the difference.

In summary, just download from the GDrive Source the files under the folders Anno and Eval, and replace the ones in the ./celeba folder and it will work.

2 replies

jbyls Jun 15, 2022

I am having issues loading this dataset as well. I'm running my notebook in Colab, tested in the same directory as the images themselves, or the celeba folder, and seeing the same error. I re-downloaded the files at the provided link, but cannot get the torchvision.datasets.CelebA module to load them from a GDrive directory.

Can you lay out a directory tree for running a notebook with these txt files and image files?

sahlebrahim Apr 21, 2024

Still won't work, it says dataset not found or corrupted

sahlebrahim · 2024-04-21T08:31:32Z

sahlebrahim
Apr 21, 2024

hi @huipingcao did you get a solution?

0 replies

rasbt · 2024-04-25T17:27:51Z

rasbt
Apr 25, 2024
Maintainer

Sorry everyone for the hassle here. It's a bit frustrating that it doesn't work out of the box via torchvision anymore.

This was also shared as an issue here: pytorch/vision#1920

What I did is I downloaded the files from here: https://drive.google.com/drive/folders/0B7EVK8r0v71pWEZsZE9oNnFzTm8?resourcekey=0-5BR16BdXnb8hVj6CNHKzLg

For simplicity, you can also use my link here where I already prepared the directory structure: https://drive.google.com/file/d/1m8-EBPgi5MRubrm6iQjafK2QMHDBMSfJ/view?usp=share_link

Download that zip file and place it in the celeba folder. Then unzip img_align_celeba.zip. And it should work:

0 replies

matteopilotto · 2024-07-27T13:57:42Z

matteopilotto
Jul 27, 2024

Hey folks,
For those still facing issues downloading the CelebA dataset, you can use the code snippet I'm going to share below.
I personally tested it because I was also having troubles and, as of July 27 2024, it gets the job done.

Download the data from the URL provided by @rasbt

import gdown

url = "https://drive.google.com/uc?id=1m8-EBPgi5MRubrm6iQjafK2QMHDBMSfJ"
output = "./celeba.zip" 

gdown.download(url, output, quiet=False)

Unzip the data

import zipfile

with zipfile.ZipFile("./celeba.zip", "r") as zip_ref:
    zip_ref.extractall("./")

with zipfile.ZipFile("./celeba/img_align_celeba.zip", "r") as zip_ref:
    zip_ref.extractall("./celeba")

Load the data

import torchvision 

image_path = './'
celeba_train_dataset = torchvision.datasets.CelebA(image_path, split='train', target_type='attr', download=False)
celeba_valid_dataset = torchvision.datasets.CelebA(image_path, split='valid', target_type='attr', download=False)
celeba_test_dataset = torchvision.datasets.CelebA(image_path, split='test', target_type='attr', download=False)

print('Train set:', len(celeba_train_dataset))
print('Validation set:', len(celeba_valid_dataset))
print('Test set:', len(celeba_test_dataset))

Hope this is helpful and happy coding 🤟!

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issues of loading celeba dataset - page 512 #54

{{title}}

Replies: 4 comments 2 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Issues of loading celeba dataset - page 512 #54

huipingcao Apr 14, 2022

Replies: 4 comments · 2 replies

bkemmer Apr 30, 2022

jbyls Jun 15, 2022

sahlebrahim Apr 21, 2024

sahlebrahim Apr 21, 2024

rasbt Apr 25, 2024 Maintainer

matteopilotto Jul 27, 2024

huipingcao
Apr 14, 2022

Replies: 4 comments 2 replies

bkemmer
Apr 30, 2022

sahlebrahim
Apr 21, 2024

rasbt
Apr 25, 2024
Maintainer

matteopilotto
Jul 27, 2024