Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

爬小红书出现频繁访问的错误 #447

Open
Machoman6 opened this issue Oct 2, 2024 · 4 comments
Open

爬小红书出现频繁访问的错误 #447

Machoman6 opened this issue Oct 2, 2024 · 4 comments

Comments

@Machoman6
Copy link

Traceback (most recent call last):
File "D:\pythonProject.venv\MediaCrawler-main\Lib\site-packages\tenacity_asyncio.py", line 50, in call
result = await fn(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\clpq\MediaCrawler-main\media_platform\xhs\client.py", line 99, in request
raise DataFetchError(data.get("msg", None))
media_platform.xhs.exception.DataFetchError: 访问频次异常,请勿频繁操作或重启试试

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "D:\clpq\MediaCrawler-main\main.py", line 55, in
asyncio.get_event_loop().run_until_complete(main())
File "C:\Users\zxnb\AppData\Local\Programs\Python\Python312\Lib\asyncio\base_events.py", line 687, in run_until_complete
return future.result()
^^^^^^^^^^^^^^^
File "D:\clpq\MediaCrawler-main\main.py", line 45, in main
await crawler.start()
File "D:\clpq\MediaCrawler-main\media_platform\xhs\core.py", line 78, in start
await self.search()
File "D:\clpq\MediaCrawler-main\media_platform\xhs\core.py", line 138, in search
await self.batch_get_note_comments(note_id_list)
File "D:\clpq\MediaCrawler-main\media_platform\xhs\core.py", line 252, in batch_get_note_comments
await asyncio.gather(*task_list)
File "D:\clpq\MediaCrawler-main\media_platform\xhs\core.py", line 258, in get_comments
await self.xhs_client.get_note_all_comments(
File "D:\clpq\MediaCrawler-main\media_platform\xhs\client.py", line 288, in get_note_all_comments
comments_res = await self.get_note_comments(note_id, comments_cursor)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\clpq\MediaCrawler-main\media_platform\xhs\client.py", line 249, in get_note_comments
return await self.get(uri, params)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\clpq\MediaCrawler-main\media_platform\xhs\client.py", line 116, in get
return await self.request(method="GET", url=f"{self.host}{final_uri}", headers=headers)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\pythonProject.venv\MediaCrawler-main\Lib\site-packages\tenacity_asyncio.py", line 88, in async_wrapped
return await fn(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\pythonProject.venv\MediaCrawler-main\Lib\site-packages\tenacity_asyncio.py", line 47, in call
do = self.iter(retry_state=retry_state)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\pythonProject.venv\MediaCrawler-main\Lib\site-packages\tenacity_init
.py", line 326, in iter
raise retry_exc from fut.exception()
tenacity.RetryError: RetryError[<Future at 0x24f2e98f8c0 state=finished raised DataFetchError>]

@luyixiao31
Copy link

你爬了多少条出现这个错误

@xiaou61
Copy link

xiaou61 commented Oct 5, 2024

我也遇到了 是爬取小红书评论的时候 大概2000多条 这个没办法了,只能说换ip了

@xukaizhao
Copy link

我就爬了三十多条就不行了

@97wgl
Copy link

97wgl commented Oct 12, 2024

测试了一下,好像是20个搜索词就会被限制。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants