You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Let's use "real" wiki page of "C++" as an example:
"https://en.wikipedia.org/wiki/C++" is a html page and it has some sub pages under the directory "C++": https://en.wikipedia.org/wiki/C++/CLI
This situation cannot be represented in dump static html files because "C++" cannot be a html file and directory at same time.
How should zimdump generate redirect is not so easy, especially when "C++" is url-escaped as "C%2B%2B". Then redirect URL has to include a safe-URL encoded to check if actual filesystem directory "C++" actually exists.
To give a real test case, please download "kiwix" computer zim from: https://download.kiwix.org/zim/wikipedia/wikipedia_en_computer_maxi_2024-05.zim
And when you zimdump it, you will see "C++" html page is missing because "C++" is a directory to hold "CLI" page in filesystem.
The text was updated successfully, but these errors were encountered:
escapeSlash.diff.zip
I have a simple solution to solve this issue by escaping all '/' in path of filename to allow all article/picture residing in same level of directory to avoiding this directory-filename-conflicts. Even though this may create potential filename longer than 255 issue when directory is too deep. However, my tests show at least for "https://download.kiwix.org/zim/wikipedia/wikipedia_en_computer_maxi_2024-05.zim" we no longer have any single exception.
Attached please find patch. Can developer to take a look to see if it can be patched if ok.
Thank you.
Let's use "real" wiki page of "C++" as an example:
"https://en.wikipedia.org/wiki/C++" is a html page and it has some sub pages under the directory "C++":
https://en.wikipedia.org/wiki/C++/CLI
This situation cannot be represented in dump static html files because "C++" cannot be a html file and directory at same time.
How should zimdump generate redirect is not so easy, especially when "C++" is url-escaped as "C%2B%2B". Then redirect URL has to include a safe-URL encoded to check if actual filesystem directory "C++" actually exists.
To give a real test case, please download "kiwix" computer zim from:
https://download.kiwix.org/zim/wikipedia/wikipedia_en_computer_maxi_2024-05.zim
And when you zimdump it, you will see "C++" html page is missing because "C++" is a directory to hold "CLI" page in filesystem.
The text was updated successfully, but these errors were encountered: