-
Notifications
You must be signed in to change notification settings - Fork 120
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Galleries with broken thumbnail urls are generating .webp files with html content. #349
Comments
Forgot to add, I'm on the latest commit, f30ff59. |
Found it I think: Line 156 in f30ff59
If I'm reading it right, this explains the parsing of the invalid webp url. In this case it's a broken thumbnail url, but if there was ever a case where nhentai implemented a different thumbnail extension from the actaul doujin image extension, that might bork the downloader. I suppose the alternative would be to follow each url on the gallery page and extract the image url one at a time, but that sounds a bit expensive. Maybe error checking for a 404 code when downloading and failing with an error might be a good way to go. I'd rather have a failure on rare edge cases than an archive with missing images. |
I'll check it out |
Need more sample doujinshi, |
After some investigations, I found that:
Need to determine whether it is an isolated case or the norm. |
Here are all of the codes with bad thumbnail image urls. I've scraped everything released recently to check. 538005
Thanks for taking as a look. |
I noticed some more galleries with issues: 538053 Looks like this'll be an issue until it's fixed on the nhentai side. Here is a quick and dirty workaround if anyone needs to get this working: This should string split the two extensions and then use the first one.... probably introducing new edge cases with this, but it works for now. |
Not sure if my problem is related, if not, please ignore and I will open a new Issue - since the downtime of nhentai earlier this week I can't download certain doujinshis, I haven't updated the script until earlier today to try and fix it but it keeps happening in the same way. [11:56:00] doujinshi_parser: Fetching doujinshi information of id 538003 It stays there for some time and just dies, I use the favorites method but even when trying to download just that one, same result. |
Not the same issue as this one. nHentai started using webp images which did not have support for parsing until f30ff59 was commited a couple days ago. If you git clone and install from source, it should work. I'm not sure if this fix has been pushed out to any other install methods. |
I'm using the nhentaiGUI, so I just edited the couple of lines in the files I have, works now, thanks for the info and fix. Solved on my part. |
i have the same problem it seem like it happen with any doujin uploaded recently ie: 538703
|
how did you fix it, I have the same issue |
Find the files from nhentai in your python folder, "AppData\Local\Programs\Python\Python312\Lib\site-packages\nhentai" is mine for example, should be about the same unless you installed it differently. f30ff59 - look for the differences and the number at the left of the line or just find it without, edit them, save and try if it works. If it doesn't, maybe replacing them works for you. |
I save everything in .cbz, which works fine apart from like every 10th doujins first page being corrupted, so I think it's the webp failing to download correctly or something, I guess the img2pdf.py can't handle the first page now working and just dies. Not sure if the problem is fixable by adjusting the script of if it's a problem on nhentais end. My advice is using the .cbz function for the ones that don't work as a pdf until it's fixed, unless you find a way to fix it yourself |
Previously these broken thumbnails on nhentai with double extensions (example 3122455/1t.jpg.webp) were invalid and didn't load on the gallery page. Now they do load. This means nhentai fixed that issue by just changing actual thumbnail image file to match to broken url in the html. Probably indicates that filenames of the thumbnails will not be updated to the normal convention on the nhentai website. This means the current parsing method will fail to download affected images on these handful of releases. If you don't mind updating two lines of code, here is a workaround: maltbeverage@ea52cff I think to resolve the issue:
|
Yesterday i had a batch with broken thumbnails so i looked into it too. It seems that there's now a mix between e.g. example gallery: /g/538696/ change line 158-160 in parser.py
to:
I agree with the part that it not throwing an error here is annoying |
After webp images started showing up, I noticed a few galleries were pulling in broken webp images. On closer inspection, the downloaded files contain html that show a 404 error.
Example 538028, the first thumbnail is referencing an invalid url:
Looks like an issue with nhentai. I can remove the .webp extension from the thumbnail url
and the thumbnail image will load in.
The broken thumnail links to page 1 of the doujin and does indeed have a working image:
So this broken thumnail might be messing up the parsing somehow. When comming accross these broken thumbnails, I think it attempts to download
which does not exist, the actual file is
and this successfully saves as a .webp file, but the contents are the html of the 404 error.
I'm thinking this might be a parsing logic issue if the thumbnail url is somehow used to determine the file extension of the downloaded image file.
This only affects around 5 galleries at the moment.
The text was updated successfully, but these errors were encountered: