Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

zimdump dump misses html files when file name conflict with directory name #414

Open
nickhuang99 opened this issue Jun 14, 2024 · 2 comments
Assignees
Milestone

Comments

@nickhuang99
Copy link

Let's use "real" wiki page of "C++" as an example:
"https://en.wikipedia.org/wiki/C++" is a html page and it has some sub pages under the directory "C++":
https://en.wikipedia.org/wiki/C++/CLI
This situation cannot be represented in dump static html files because "C++" cannot be a html file and directory at same time.
How should zimdump generate redirect is not so easy, especially when "C++" is url-escaped as "C%2B%2B". Then redirect URL has to include a safe-URL encoded to check if actual filesystem directory "C++" actually exists.
To give a real test case, please download "kiwix" computer zim from:
https://download.kiwix.org/zim/wikipedia/wikipedia_en_computer_maxi_2024-05.zim

And when you zimdump it, you will see "C++" html page is missing because "C++" is a directory to hold "CLI" page in filesystem.

@nickhuang99
Copy link
Author

After submitted, I realized this is a duplicate of old issue #190
Can someone dup it or should I close it? Maybe just another test cases in future?

@nickhuang99
Copy link
Author

escapeSlash.diff.zip
I have a simple solution to solve this issue by escaping all '/' in path of filename to allow all article/picture residing in same level of directory to avoiding this directory-filename-conflicts. Even though this may create potential filename longer than 255 issue when directory is too deep. However, my tests show at least for "https://download.kiwix.org/zim/wikipedia/wikipedia_en_computer_maxi_2024-05.zim" we no longer have any single exception.

Attached please find patch. Can developer to take a look to see if it can be patched if ok.
Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants