Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"Bad zip file, please report" when downloading dataset using Kaggle API #516

Open
hlysine opened this issue Nov 12, 2023 · 0 comments
Open

Comments

@hlysine
Copy link

hlysine commented Nov 12, 2023

I was downloading a dataset automatically using the Kaggle API in a Streamlit application. When the download has finished, Kaggle API throws an error and fails to unzip the data.

The dataset is https://www.kaggle.com/datasets/nih-chest-xrays/sample/data
I was using kaggle-1.5.16 in a Linux container on Hugging Face.

  • Downloading the same dataset using Kaggle API on my Windows machine works without issue
  • Downloading the same dataset using Kaggle web works without issue
  • Downloading other datasets using the same setup on Hugging Face works without issue

The only relevant log I can find is this:

Downloading sample.zip to /home/user/app
Downloading sample.zip to /home/user/app
... resuming from 655360 bytes (4505704260 bytes left) ...
Downloading sample.zip to /home/user/app
... resuming from 2097152 bytes (4504262468 bytes left) ...
  0%|          | 0.00/4.20G [00:00<?, ?B/s]
  0%|          | 2.00M/4.20G [00:00<05:14, 14.3MB/s]
  0%|          | 640k/4.20G [00:00<?, ?B/s]
  0%|          | 2.00M/4.20G [00:00<?, ?B/s]
  0%|          | 4.00M/4.20G [00:00<05:56, 12.6MB/s]
  0%|          | 2.62M/4.20G [00:00<04:09, 18.0MB/s]
  0%|          | 4.00M/4.20G [00:00<06:22, 11.8MB/s]
  0%|          | 4.62M/4.20G [00:00<04:59, 15.0MB/s]
  0%|          | 7.00M/4.20G [00:00<04:42, 15.9MB/s]

...

('Bad zip file, please report on www.github.com/kaggle/kaggle-api', BadZipFile('File is not a zip file'))

100%|██████████| 4.20G/4.20G [00:24<00:00, 181MB/s]
('Bad zip file, please report on www.github.com/kaggle/kaggle-api', BadZipFile('Bad magic number for central directory'))

 98%|█████████▊| 4.13G/4.20G [00:24<00:00, 201MB/s]
 99%|█████████▉| 4.16G/4.20G [00:24<00:00, 87.7MB/s]
100%|█████████▉| 4.19G/4.20G [00:25<00:00, 116MB/s] 
100%|██████████| 4.20G/4.20G [00:25<00:00, 176MB/s]
('Bad zip file, please report on www.github.com/kaggle/kaggle-api', BadZipFile('Bad magic number for file header'))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant