[youtube] fix: playlist #150

insaneracist · 2020-11-10T08:40:13Z

Before submitting a pull request make sure you have:

At least skimmed through adding new extractor tutorial and youtube-dl coding conventions sections
Searched the bugtracker for similar pull requests
Checked the code with flake8

In order to be accepted and merged into youtube-dl each piece of code must be in public domain or released under Unlicense. Check one of the following options:

I am the original author of this code and I am willing to release it under Unlicense

What is the purpose of your pull request?

Bug fix
Improvement
New extractor
New feature

Fixes #148
Quick hack that needs testing

someziggyman · 2020-11-10T08:59:14Z

Ok, so, good new is that it seems to be working with all playlist types.
Regular playlist ID:
PLszW2az_oxFd7dFeCb1FFhk7c_eEer5n1
Mix playlist ID:
RDKR9wGi7gVLQ
Search playlist:
https://www.youtube.com/results?search_query=linkin+park+numb
And channel playlist:
https://www.youtube.com/user/TheLinuxFoundation/playlists

There's also a new fix offered here:
#151
Will test it now. Hard to tell which one is best if it also works.

GitHildur

This works for me 👍

pukkandan · 2020-11-10T13:06:18Z

I can confirm that this works at least for normal playlists and channels

Edit: Never mind, I see that it has already been reviewed :)

youtube_dlc/extractor/youtube.py

insaneracist · 2020-11-10T14:25:47Z

strange, this commit isn't showing up here. insaneracist@b2a462a

edit: that superfluous commit woke it up.

blackjack4494 · 2020-11-10T21:03:42Z

[youtube:playlist] RDCLAK5uy_m_h-nx7OCFaq9AlyXv78lG0AuloqW_NUA: Downloading page 87
[youtube:playlist] RDCLAK5uy_m_h-nx7OCFaq9AlyXv78lG0AuloqW_NUA: Downloading page 88
[youtube:playlist] RDCLAK5uy_m_h-nx7OCFaq9AlyXv78lG0AuloqW_NUA: Downloading page 89
[youtube:playlist] RDCLAK5uy_m_h-nx7OCFaq9AlyXv78lG0AuloqW_NUA: Downloading page 90
[youtube:playlist] RDCLAK5uy_m_h-nx7OCFaq9AlyXv78lG0AuloqW_NUA: Downloading page 91
[youtube:playlist] RDCLAK5uy_m_h-nx7OCFaq9AlyXv78lG0AuloqW_NUA: Downloading page 92
[youtube:playlist] RDCLAK5uy_m_h-nx7OCFaq9AlyXv78lG0AuloqW_NUA: Downloading page 93
[youtube:playlist] RDCLAK5uy_m_h-nx7OCFaq9AlyXv78lG0AuloqW_NUA: Downloading page 94
[youtube:playlist] RDCLAK5uy_m_h-nx7OCFaq9AlyXv78lG0AuloqW_NUA: Downloading page 95
[youtube:playlist] RDCLAK5uy_m_h-nx7OCFaq9AlyXv78lG0AuloqW_NUA: Downloading page 96
[youtube:playlist] RDCLAK5uy_m_h-nx7OCFaq9AlyXv78lG0AuloqW_NUA: Downloading page 97
[youtube:playlist] RDCLAK5uy_m_h-nx7OCFaq9AlyXv78lG0AuloqW_NUA: Downloading page 98
[youtube:playlist] RDCLAK5uy_m_h-nx7OCFaq9AlyXv78lG0AuloqW_NUA: Downloading page 99
[youtube:playlist] RDCLAK5uy_m_h-nx7OCFaq9AlyXv78lG0AuloqW_NUA: Downloading page 100
[youtube:playlist] RDCLAK5uy_m_h-nx7OCFaq9AlyXv78lG0AuloqW_NUA: Downloading page 101
[youtube:playlist] RDCLAK5uy_m_h-nx7OCFaq9AlyXv78lG0AuloqW_NUA: Downloading page 102
[youtube:playlist] RDCLAK5uy_m_h-nx7OCFaq9AlyXv78lG0AuloqW_NUA: Downloading page 103

it's never ending 🤣

blackjack4494 · 2020-11-10T22:30:24Z

converting this to draft for now.
As it turns out #151 works better. I experienced some issues here.
That does not mean this PR is obsolet.

SoneeJohn · 2020-11-10T22:46:40Z

[youtube:playlist] RDCLAK5uy_m_h-nx7OCFaq9AlyXv78lG0AuloqW_NUA: Downloading page 87
[youtube:playlist] RDCLAK5uy_m_h-nx7OCFaq9AlyXv78lG0AuloqW_NUA: Downloading page 88
[youtube:playlist] RDCLAK5uy_m_h-nx7OCFaq9AlyXv78lG0AuloqW_NUA: Downloading page 89
[youtube:playlist] RDCLAK5uy_m_h-nx7OCFaq9AlyXv78lG0AuloqW_NUA: Downloading page 90
[youtube:playlist] RDCLAK5uy_m_h-nx7OCFaq9AlyXv78lG0AuloqW_NUA: Downloading page 91
[youtube:playlist] RDCLAK5uy_m_h-nx7OCFaq9AlyXv78lG0AuloqW_NUA: Downloading page 92
[youtube:playlist] RDCLAK5uy_m_h-nx7OCFaq9AlyXv78lG0AuloqW_NUA: Downloading page 93
[youtube:playlist] RDCLAK5uy_m_h-nx7OCFaq9AlyXv78lG0AuloqW_NUA: Downloading page 94
[youtube:playlist] RDCLAK5uy_m_h-nx7OCFaq9AlyXv78lG0AuloqW_NUA: Downloading page 95
[youtube:playlist] RDCLAK5uy_m_h-nx7OCFaq9AlyXv78lG0AuloqW_NUA: Downloading page 96
[youtube:playlist] RDCLAK5uy_m_h-nx7OCFaq9AlyXv78lG0AuloqW_NUA: Downloading page 97
[youtube:playlist] RDCLAK5uy_m_h-nx7OCFaq9AlyXv78lG0AuloqW_NUA: Downloading page 98
[youtube:playlist] RDCLAK5uy_m_h-nx7OCFaq9AlyXv78lG0AuloqW_NUA: Downloading page 99
[youtube:playlist] RDCLAK5uy_m_h-nx7OCFaq9AlyXv78lG0AuloqW_NUA: Downloading page 100
[youtube:playlist] RDCLAK5uy_m_h-nx7OCFaq9AlyXv78lG0AuloqW_NUA: Downloading page 101
[youtube:playlist] RDCLAK5uy_m_h-nx7OCFaq9AlyXv78lG0AuloqW_NUA: Downloading page 102
[youtube:playlist] RDCLAK5uy_m_h-nx7OCFaq9AlyXv78lG0AuloqW_NUA: Downloading page 103

it's never ending 🤣

That's because those are mixes playlist they start with a prefix of RD, UL and PU

someziggyman · 2020-11-10T22:50:42Z

[youtube:playlist] RDCLAK5uy_m_h-nx7OCFaq9AlyXv78lG0AuloqW_NUA: Downloading page 87
[youtube:playlist] RDCLAK5uy_m_h-nx7OCFaq9AlyXv78lG0AuloqW_NUA: Downloading page 88
[youtube:playlist] RDCLAK5uy_m_h-nx7OCFaq9AlyXv78lG0AuloqW_NUA: Downloading page 89
[youtube:playlist] RDCLAK5uy_m_h-nx7OCFaq9AlyXv78lG0AuloqW_NUA: Downloading page 90
[youtube:playlist] RDCLAK5uy_m_h-nx7OCFaq9AlyXv78lG0AuloqW_NUA: Downloading page 91
[youtube:playlist] RDCLAK5uy_m_h-nx7OCFaq9AlyXv78lG0AuloqW_NUA: Downloading page 92
[youtube:playlist] RDCLAK5uy_m_h-nx7OCFaq9AlyXv78lG0AuloqW_NUA: Downloading page 93
[youtube:playlist] RDCLAK5uy_m_h-nx7OCFaq9AlyXv78lG0AuloqW_NUA: Downloading page 94
[youtube:playlist] RDCLAK5uy_m_h-nx7OCFaq9AlyXv78lG0AuloqW_NUA: Downloading page 95
[youtube:playlist] RDCLAK5uy_m_h-nx7OCFaq9AlyXv78lG0AuloqW_NUA: Downloading page 96
[youtube:playlist] RDCLAK5uy_m_h-nx7OCFaq9AlyXv78lG0AuloqW_NUA: Downloading page 97
[youtube:playlist] RDCLAK5uy_m_h-nx7OCFaq9AlyXv78lG0AuloqW_NUA: Downloading page 98
[youtube:playlist] RDCLAK5uy_m_h-nx7OCFaq9AlyXv78lG0AuloqW_NUA: Downloading page 99
[youtube:playlist] RDCLAK5uy_m_h-nx7OCFaq9AlyXv78lG0AuloqW_NUA: Downloading page 100
[youtube:playlist] RDCLAK5uy_m_h-nx7OCFaq9AlyXv78lG0AuloqW_NUA: Downloading page 101
[youtube:playlist] RDCLAK5uy_m_h-nx7OCFaq9AlyXv78lG0AuloqW_NUA: Downloading page 102
[youtube:playlist] RDCLAK5uy_m_h-nx7OCFaq9AlyXv78lG0AuloqW_NUA: Downloading page 103

it's never ending 🤣

it does end actually. Ran 3 tests and results are: 384, 402, 415

SoneeJohn

You might not need this. Playlists with certain prefixes (known as mixed playlists) can sometimes contain a lot of pages. My suggestion would be to see if it's a mix and fetch just the first page and implement an argument to have the maximum number of fetches for a mix playlist.

See

yt-dlc/youtube_dlc/extractor/youtube.py

Lines 2847 to 2877 in 29e9c94

    
               def _extract_mix(self, playlist_id): 
        
                   # The mixes are generated from a single video 
        
                   # the id of the playlist is just 'RD' + video_id 
        
                   ids = [] 
        
                   yt_initial = None 
        
                   last_id = playlist_id[-11:] 
        
                   for n in itertools.count(1): 
        
                       url = 'https://www.youtube.com/watch?v=%s&list=%s' % (last_id, playlist_id) 
        
                       webpage = self._download_webpage( 
        
                           url, playlist_id, 'Downloading page {0} of Youtube mix'.format(n)) 
        
                       new_ids = orderedSet(re.findall( 
        
                           r'''(?xs)data-video-username=".*?".*? 
        
                                      href="/watch\?v=([0-9A-Za-z_-]{11})&amp;[^"]*?list=%s''' % re.escape(playlist_id), 
        
                           webpage)) 
        
                       # if no ids in html of page, try using embedded json 
        
                       if (len(new_ids) == 0): 
        
                           yt_initial = self._get_yt_initial_data(playlist_id, webpage) 
        
                           if yt_initial: 
        
                               new_ids = self._extract_mix_ids_from_yt_initial(yt_initial) 
        
                       # Fetch new pages until all the videos are repeated, it seems that 
        
                       # there are always 51 unique videos. 
        
                       new_ids = [_id for _id in new_ids if _id not in ids] 
        
                       if not new_ids: 
        
                           break 
        
                       ids.extend(new_ids) 
        
                       last_id = ids[-1] 
        
                   url_results = self._ids_to_results(ids)

yt-dlc/youtube_dlc/extractor/youtube.py

Lines 3051 to 3059 in 29e9c94

    
           if playlist_id.startswith(('RD', 'UL', 'PU')): 
        
               if not playlist_id.startswith(self._YTM_PLAYLIST_PREFIX): 
        
                   # Mixes require a custom extraction process, 
        
                   # Youtube Music playlists act like normal playlists (with randomized order) 
        
                   return self._extract_mix(playlist_id) 
        
           has_videos, playlist = self._extract_playlist(playlist_id) 
        
           if has_videos or not video_id: 
        
               return playlist

blackjack4494 · 2020-11-10T22:56:30Z

Just don't give up on this yet.
If implemented like in #151 you will get proper downloading

[youtube:playlist] RDCLAK5uy_m_h-nx7OCFaq9AlyXv78lG0AuloqW_NUA: Downloading webpage
[download] Downloading playlist: RDCLAK5uy_m_h-nx7OCFaq9AlyXv78lG0AuloqW_NUA
[youtube:playlist] RDCLAK5uy_m_h-nx7OCFaq9AlyXv78lG0AuloqW_NUA: Downloading continuation page #1
[youtube:playlist] playlist RDCLAK5uy_m_h-nx7OCFaq9AlyXv78lG0AuloqW_NUA: Downloading 186 videos
[download] Downloading video 1 of 186

As I am super tired, will merge #151 now so that there is at least a working version out there. Will take a look tomorrow again.

insaneracist · 2020-11-11T05:49:17Z

@blackjack4494, thanks, i was about to give up. the problem was not sending enough client information, it kept returning the initial piece of the playlist (but only for some types).

insaneracist · 2020-11-11T05:59:44Z

@SoneeJohn, the playlists starting with RDCLAK5uy_ are special-cased, the reason is that they are from Youtube Music and have a playlist url.
e.g. https://www.youtube.com/playlist?list=RDCLAK5uy_m_h-nx7OCFaq9AlyXv78lG0AuloqW_NUA
the otherwise dynamically generated RD mixes can't be accessed that way, and are fetched via video urls (they should be hitting a different function, YoutubePlaylistIE._extract_mix)
e.g. fails: https://www.youtube.com/playlist?list=RDG8sGmSEehi4
works: https://www.youtube.com/watch?v=G8sGmSEehi4&list=RDG8sGmSEehi4

blackjack4494 · 2020-11-16T21:58:37Z

so what's the state on this one @insaneracist ?
Haven't had time yet to look into it but it seems this should handle the missing title and other metadata?

[youtube] fix: playlist

fc988a1

someziggyman mentioned this pull request Nov 10, 2020

RFC: youtube: Polymer UI and JSON endpoints for playlists #151

Merged

9 tasks

GitHildur reviewed Nov 10, 2020

View reviewed changes

[youtube] playlist title, desc

0137a78

SoneeJohn reviewed Nov 10, 2020

View reviewed changes

youtube_dlc/extractor/youtube.py Outdated Show resolved Hide resolved

[youtube] use api key and client version from page

b2a462a

[youtube] poking github

965a404

blackjack4494 marked this pull request as draft November 10, 2020 22:28

[youtube] stop loading pages if videos are already seen

29e9c94

insaneracist force-pushed the youtube-playlist branch from a779979 to 29e9c94 Compare November 10, 2020 22:40

SoneeJohn suggested changes Nov 10, 2020

View reviewed changes

[youtube] post entire client context to api endpoint

2fd8290

insaneracist force-pushed the youtube-playlist branch from cf38793 to 2fd8290 Compare November 11, 2020 05:44

insaneracist added 4 commits November 10, 2020 22:49

[youtube] INNERTUBE_CONTEXT regex adjustment

63afc79

[youtube] playlist uploader info

50aaf1e

[youtube] playlist view count

e27517e

[youtube] playlist updated date

76269f0

coletdjnz mentioned this pull request Nov 15, 2020

'uploader' information is null in YouTube playlist extractor #173

Closed

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[youtube] fix: playlist #150

[youtube] fix: playlist #150

insaneracist commented Nov 10, 2020

someziggyman commented Nov 10, 2020

GitHildur left a comment

pukkandan commented Nov 10, 2020 •

edited

Loading

insaneracist commented Nov 10, 2020 •

edited

Loading

blackjack4494 commented Nov 10, 2020

blackjack4494 commented Nov 10, 2020

SoneeJohn commented Nov 10, 2020

someziggyman commented Nov 10, 2020

SoneeJohn left a comment

blackjack4494 commented Nov 10, 2020

insaneracist commented Nov 11, 2020 •

edited

Loading

insaneracist commented Nov 11, 2020 •

edited

Loading

blackjack4494 commented Nov 16, 2020

	def _extract_mix(self, playlist_id):
	# The mixes are generated from a single video
	# the id of the playlist is just 'RD' + video_id
	ids = []
	yt_initial = None
	last_id = playlist_id[-11:]
	for n in itertools.count(1):
	url = 'https://www.youtube.com/watch?v=%s&list=%s' % (last_id, playlist_id)
	webpage = self._download_webpage(
	url, playlist_id, 'Downloading page {0} of Youtube mix'.format(n))
	new_ids = orderedSet(re.findall(
	r'''(?xs)data-video-username=".?".?
	href="/watch\?v=([0-9A-Za-z_-]{11})&[^"]*?list=%s''' % re.escape(playlist_id),
	webpage))

	# if no ids in html of page, try using embedded json
	if (len(new_ids) == 0):
	yt_initial = self._get_yt_initial_data(playlist_id, webpage)
	if yt_initial:
	new_ids = self._extract_mix_ids_from_yt_initial(yt_initial)

	# Fetch new pages until all the videos are repeated, it seems that
	# there are always 51 unique videos.
	new_ids = [_id for _id in new_ids if _id not in ids]
	if not new_ids:
	break
	ids.extend(new_ids)
	last_id = ids[-1]

	url_results = self._ids_to_results(ids)

	if playlist_id.startswith(('RD', 'UL', 'PU')):
	if not playlist_id.startswith(self._YTM_PLAYLIST_PREFIX):
	# Mixes require a custom extraction process,
	# Youtube Music playlists act like normal playlists (with randomized order)
	return self._extract_mix(playlist_id)

	has_videos, playlist = self._extract_playlist(playlist_id)
	if has_videos or not video_id:
	return playlist

[youtube] fix: playlist #150

Are you sure you want to change the base?

[youtube] fix: playlist #150

Conversation

insaneracist commented Nov 10, 2020

Before submitting a pull request make sure you have:

In order to be accepted and merged into youtube-dl each piece of code must be in public domain or released under Unlicense. Check one of the following options:

What is the purpose of your pull request?

someziggyman commented Nov 10, 2020

GitHildur left a comment

Choose a reason for hiding this comment

pukkandan commented Nov 10, 2020 • edited Loading

insaneracist commented Nov 10, 2020 • edited Loading

blackjack4494 commented Nov 10, 2020

blackjack4494 commented Nov 10, 2020

SoneeJohn commented Nov 10, 2020

someziggyman commented Nov 10, 2020

SoneeJohn left a comment

Choose a reason for hiding this comment

blackjack4494 commented Nov 10, 2020

insaneracist commented Nov 11, 2020 • edited Loading

insaneracist commented Nov 11, 2020 • edited Loading

blackjack4494 commented Nov 16, 2020

pukkandan commented Nov 10, 2020 •

edited

Loading

insaneracist commented Nov 10, 2020 •

edited

Loading

insaneracist commented Nov 11, 2020 •

edited

Loading

insaneracist commented Nov 11, 2020 •

edited

Loading