You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Common factor among those that crash is apostrophes in the channel name!
Traceback (most recent call last):
File "/home/will/local/breda/src/dredger/ingest/tests/test_youtube.py", line 72, in test_one
youtube.get_video_data("https://www.youtube.com/watch?v=987wzJ2NHBE")
File "/home/will/local/breda/src/dredger/ingest/youtube.py", line 46, in get_video_data
metadata = extruct.extract(response.text, base_url="https://youtube.com")
File "/home/will/.virtualenvs/breda/lib/python3.8/site-packages/extruct/_extruct.py", line 108, in extract
output[syntax] = list(extract(document, base_url=base_url))
File "/home/will/.virtualenvs/breda/lib/python3.8/site-packages/extruct/jsonld.py", line 25, in extract_items
return [
File "/home/will/.virtualenvs/breda/lib/python3.8/site-packages/extruct/jsonld.py", line 25, in <listcomp>
return [
File "/home/will/.virtualenvs/breda/lib/python3.8/site-packages/extruct/jsonld.py", line 38, in _extract_items
data = jstyleson.loads(HTML_OR_JS_COMMENTLINE.sub('', script),strict=False)
File "/home/will/.virtualenvs/breda/lib/python3.8/site-packages/jstyleson.py", line 123, in loads
return json.loads(dispose(text), **kwargs)
File "/usr/lib/python3.8/json/__init__.py", line 370, in loads
return cls(**kw).decode(s)
File "/usr/lib/python3.8/json/decoder.py", line 337, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/usr/lib/python3.8/json/decoder.py", line 353, in raw_decode
obj, end = self.scan_once(s, idx)
json.decoder.JSONDecodeError: Invalid \escape: line 1 column 211 (char 210)
Haven't had a chance today to dig into much beyond triaging the above.
The text was updated successfully, but these errors were encountered:
I haven't been able to replicate the issue. Your Crash video links point to the video that has been removed. Maybe that is the reason why you are getting this error. I suggest you check the video links before passing them to the extract.
Here is the code that I used:
Code:
import extruct
import requests
from w3lib.html import get_base_url
As @wjdp suggested, it is because of the apostrophe in the channel name. json.loads() throws an error when the input contains hex codes like "\x27" (which is the apostrophe). I created a pull request #195 where I replace the hex code with the special characters themselves before passing to the json.loads() function.
I have some code to pull metadata from YouTube
Have noticed some recent crashing, but only on some videos.
No crash: https://www.youtube.com/watch?v=ZY48KUAZKhM https://www.youtube.com/watch?v=ZlVI7YJGHq0
Crash: https://www.youtube.com/watch?v=987wzJ2NHBE https://www.youtube.com/watch?v=0-EF60neguk
Common factor among those that crash is apostrophes in the channel name!
Haven't had a chance today to dig into much beyond triaging the above.
The text was updated successfully, but these errors were encountered: