Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[minor] Add option for custom Internet Archive object ID #316

Closed
wants to merge 1 commit into from

Conversation

phaseloop
Copy link

Internet Archive object ID (URL) is being computed from movie metadata (title, ID, etc). When downloading non-youtube videos with invalid metadata - multiple videos can have same URL which causes upload conflict. This option adds switch to choose manual ID.

@brandongalbraith
Copy link
Collaborator

Hey @phaseloop! Thanks for submitting this. Is there any way we could make it more deterministic? tubeup is frequently used in point and shoot mode, and rarely would someone check if a non standard item already exists before performing an upload. If we can handle the logic gracefully in tubeup to generate object IDs, we should imho.

@phaseloop
Copy link
Author

Current tubeup implementation handles this pretty well - it tries to use movie name and ID which is deterministic. My PR solves only the issue when you try to download videos from obscure sites that have broken metadata. For example there is a TV station with uses "MOVIE1" as title for each video. So after first upload to Internet Archive you can't upload anything more because object "movie1" already exists.

@vxbinaca
Copy link
Collaborator

vxbinaca commented Oct 5, 2023

No you need to talk to yt-dlp to get them to fix the extractors. The extractors are missing metadata grabbing. Adding this into Tubeup would allow people to bulk upload duplicate videos. Internet Archive is already pissed off at the amount of uploads and this would make it worse.

Just be satisfied with the options we bake in when it comes to downloads and uploads.

@vxbinaca
Copy link
Collaborator

vxbinaca commented Oct 5, 2023

Is the item ID mangled because the sites extractor isn't grabbing metadata? Go edit the extractor so it grabs the metadata we need.

@vxbinaca vxbinaca closed this Oct 5, 2023
@vxbinaca
Copy link
Collaborator

vxbinaca commented Oct 5, 2023

Internet Archive has a way of searching for things on that site. If you allow metadata or the video to be edited by the user before upload, or in this case edit the way things are organized there, you will introduce chaos and break the way things are done.

If a site extractor in yt-dlp doesn't do metadata properly, go submit a PR to them and then we get the downstream benefit without the code cruft of having to hand edit uploads through flags.

Thank you @phaseloop but if a site is broken - and I notice you didn't mention the site - talk to Puka at yt-dlp and get it fixed there. We are a middeman who eases mirroring and the last few PRs have been seemingly to complicate and frustrate rips.

Please feel free to submit PRs in the future to fix bugs that are actually with Tubeup.

@phaseloop
Copy link
Author

@vxbinaca - this is not about fixing yt-dlp, it is about metadata that is not physically there. Imagine TV station naming each of their files "test1".

@vxbinaca
Copy link
Collaborator

vxbinaca commented Oct 5, 2023

@vxbinaca - this is not about fixing yt-dlp, it is about metadata that is not physically there. Imagine TV station naming each of their files "test1".

Tubeup does not download the video or gather metadata, it is merely a middeman that semi-automates it. Yt-dlp downloads the video and metadata, Tubeup makes a item based on that metadata. If the metadata that Tubeup requires is lacking, then it is on yt-dlp to fix that extractor to get the metadata we require. This is called a division of labor.

Submit a PR to yt-dlp to fix that sites extractor to provide scraping of the metadata you require.

The 'solution' you offer still won't fix the problem, requires other users to lengthen commands, adds code complexity to Tubeup that has to be maintained, and most of all opens Tubeup to abuse by allowing item identifiers to be changed. Archive.org has scripts that automatically move Tubeup uploads to a collection, allowing open-ended item names breaks this and gets me yelled at.

Your PR is ill-concieved and a incorrect 'fix' to a 'problem' that's not Tubeups problem. Sorry don't take it personally.

Talk to Pukkaden about fixing the yt-dlp site extractor you need.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants