Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request] KemonoParty: Preventing duplicates with revisions #6096

Open
tezrilet opened this issue Aug 27, 2024 · 9 comments
Open

[Feature Request] KemonoParty: Preventing duplicates with revisions #6096

tezrilet opened this issue Aug 27, 2024 · 9 comments

Comments

@tezrilet
Copy link

I'm currently downloading all post revisions and organizing them with the following directory structure:

{username}/{service}/[{id}] ({date:%Y-%m-%d}) {title[:100]}

However, both the post's date and title could change between revisions. I can send example URLs privately for both situations, if needed. This creates a duplicate folder, and could potentially eat up space with all the content being redownloaded. I have also tried using {published[:10]} instead of date, but in later revisions it can be null, so it duplicates using "None". Though, that still wouldn't address title changes.

If there currently isn't a way to solve this (aside from obviously not using date/title), could we get some extra options to use with the format strings, such as {earliest_revision_date}, {latest_revision_date}, {earliest_revision_title}, and {latest_revision_title}?

@a84r7a3rga76fg
Copy link

Remove the title from the file name. Download the post's unique files only. Save the metadata of the post from Kemono and use it to sort the files with symbolic or hard links to not waste any storage space. Replace creator_id and post_id in the URL with the correct ID of the creator and post. Trying to sort files from Kemono without wasting space will only lead to frustration.

https://kemono.su/api/v1/service/user/creator_id/post/post_id

"archive-format": "{subcategory}_{user}_{id}_{hash}",
"archive": "~/gallery-dl/archives/kemono/{subcategory} kemono {user}.sqlite",
"directory": ["{subcategory} kemono {user}", "{date!s:.10} {id}"],
"filename": "{hash}.{extension}"

@tezrilet
Copy link
Author

Thanks, but

(aside from obviously not using date/title)

Yes, I'm already saving unique files using their hash. However, your suggestion still uses date, so it'll still create duplicates. I appreciate the suggestion, though! It's just that it'd be nice to navigate things with a file browser and search while using meaningful paths with a title. I don't mind if it has to make an extra request to get the earliest revision date/title, since I'm already grabbing them all anyway.

@Hrxn
Copy link
Contributor

Hrxn commented Aug 30, 2024

The suggestion above does not use date in the archive-format, though?

@tezrilet
Copy link
Author

I was referring to the directory option ("directory": ["{subcategory} kemono {user}", "{date!s:.10} {id}"],), but the problem still occurs because the date changes between revisions. I tried it, and while it does prevent duplicating the files, I still end up with multiple folders. Ideally, I want to use a fixed value for the date and title, such as the earliest ones available from the first revision. I can provide a list of URLs privately if needed.

@tezrilet tezrilet changed the title [Kemono] Preventing duplicates with revisions [Feature Request] KemonoParty: Preventing duplicates with revisions Sep 3, 2024
@tezrilet
Copy link
Author

@mikf Bumping since an admin announced that Kemono is shutting down on November 22nd. Since this issue never got a label, is it considered a won't do/out of scope?

@mikf
Copy link
Owner

mikf commented Oct 29, 2024

You might as well consider this "won't fix" then, as there is a good chance the next release will be after 2024.11.22.

If there currently isn't a way to solve this (aside from obviously not using date/title), could we get some extra options to use with the format strings, such as {earliest_revision_date}, {latest_revision_date}, {earliest_revision_title}, and {latest_revision_title}?

Each revision has 4 metadata fields:

  • revision_id
  • revision_index
  • revision_count
  • revision_hash

The earliest revision entry has a revision_index value of 1, the latest a revision_index == revision_count and its revision_id is 0.

Using conditional file/directory names, you could do do something like

        "directory": {
            "revision_index == 1"             : ["{username}", "{service}", "[{id}]", "earliest revision: {title}"],
            "revision_index == revision_count": ["{username}", "{service}", "[{id}]", "latest revision: {title}"],
            "":                                 ["{username}", "{service}", "[{id}]", "{revision_id}: {title}"]
        }

@tezrilet
Copy link
Author

Thanks for the suggestion, though that still creates duplicate files. I think what I may end up having to do is to only use the post ID as the folder name, then write a script to rename them properly after downloading.

@mikf
Copy link
Owner

mikf commented Oct 30, 2024

though that still creates duplicate files

It won't if you use an archive with {hash} as archive-format, as suggested by #6096 (comment)

@tezrilet
Copy link
Author

tezrilet commented Nov 3, 2024

I might try that instead, now that I think about it. The majority of content shouldn't have that many revisions, so fixing the few duplicate folders might be easier.

In regards to the comment I just left, #6415 (comment), would this still be a viable feature to add?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants