Issues of loading celeba dataset - page 512 #54
-
Hello, Thank you for the very informative book and repository. I am trying the workaround to load the celeba dataset following what is provided in the PyTorch book page 512 However, I ran into the same following error when I used either my local environment or google colab In the local environment, I put the celeba in the same folder as my python code (e.g., test.py and celeba are all directly in folder A). I tried both relative path and absolute path. Neither worked, the error shows /site-packages/torchvision/datasets/celeba.py", line 83, in init Using google Colab, I set the image path to be the absolute path or relative path, neither worked. It shows RuntimeError: Dataset not found or corrupted. You can use download=True to download it I tried to read the celeba.py, it seems that the dataset cannot pass the check_integrity function. Thanks! |
Beta Was this translation helpful? Give feedback.
Replies: 4 comments 2 replies
-
Hi @huipingcao I believe the reason that it's saying "Dataset corrupted" is because the md5 hash of the committed files in the folder has changed. If you take a look at the source code of the celeba link and run (in your terminal): In summary, just download from the GDrive Source the files under the folders |
Beta Was this translation helpful? Give feedback.
-
hi @huipingcao did you get a solution? |
Beta Was this translation helpful? Give feedback.
-
Sorry everyone for the hassle here. It's a bit frustrating that it doesn't work out of the box via torchvision anymore. This was also shared as an issue here: pytorch/vision#1920 What I did is I downloaded the files from here: https://drive.google.com/drive/folders/0B7EVK8r0v71pWEZsZE9oNnFzTm8?resourcekey=0-5BR16BdXnb8hVj6CNHKzLg For simplicity, you can also use my link here where I already prepared the directory structure: https://drive.google.com/file/d/1m8-EBPgi5MRubrm6iQjafK2QMHDBMSfJ/view?usp=share_link Download that zip file and place it in the |
Beta Was this translation helpful? Give feedback.
-
Hey folks,
import gdown
url = "https://drive.google.com/uc?id=1m8-EBPgi5MRubrm6iQjafK2QMHDBMSfJ"
output = "./celeba.zip"
gdown.download(url, output, quiet=False)
import zipfile
with zipfile.ZipFile("./celeba.zip", "r") as zip_ref:
zip_ref.extractall("./")
with zipfile.ZipFile("./celeba/img_align_celeba.zip", "r") as zip_ref:
zip_ref.extractall("./celeba")
import torchvision
image_path = './'
celeba_train_dataset = torchvision.datasets.CelebA(image_path, split='train', target_type='attr', download=False)
celeba_valid_dataset = torchvision.datasets.CelebA(image_path, split='valid', target_type='attr', download=False)
celeba_test_dataset = torchvision.datasets.CelebA(image_path, split='test', target_type='attr', download=False)
print('Train set:', len(celeba_train_dataset))
print('Validation set:', len(celeba_valid_dataset))
print('Test set:', len(celeba_test_dataset)) Hope this is helpful and happy coding 🤟! |
Beta Was this translation helpful? Give feedback.
Sorry everyone for the hassle here. It's a bit frustrating that it doesn't work out of the box via torchvision anymore.
This was also shared as an issue here: pytorch/vision#1920
What I did is I downloaded the files from here: https://drive.google.com/drive/folders/0B7EVK8r0v71pWEZsZE9oNnFzTm8?resourcekey=0-5BR16BdXnb8hVj6CNHKzLg
For simplicity, you can also use my link here where I already prepared the directory structure: https://drive.google.com/file/d/1m8-EBPgi5MRubrm6iQjafK2QMHDBMSfJ/view?usp=share_link
Download that zip file and place it in the
celeba
folder. Then unzipimg_align_celeba.zip
. And it should work: