-
Notifications
You must be signed in to change notification settings - Fork 258
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Potential bots #353
Comments
Is this merged? |
The user-agents marked with ✅ have been added, the others need adding 👍🏻 |
This is the UserAgent of the Google-Weblight bot:
|
There's also:
Should these 2 existing rules be replaced:
with a simple "Google-Ads" detection? |
Yeah, go for it 👍 |
Probably no way to detect but these 2 visit my entirely Danish site every day... The first twice a day from the US and the second once a day from China. These are all the useragent headers and all of it seems to be removed via excludes. Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.97 Safari/537.36 Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.142 Safari/537.36 |
Yep, pretty annoying bots like this. Nothing this package can do about that 🤔 |
I found this list if anyone's interested in going through it ;-P https://user-agents.net/bots I don't have enough experience with regex to do it myself sadly... As my original post showed (hadn't noticed the bot I mentioned already would get catched by the "bot" in the regex). |
what about the new generation of AI / Machine Learning crawlers/bots like GPTbot? I don't see them listed here, however the demo detection site recognizes it? |
Can you give me some example user agents |
I found this repo after running into some issues caused by crawlers, but saw the following not in the list. So since you asked for some examples, here are some of them which crashed our sites :/
|
@JayBizzle I found a blog post about GPTbot here: https://blog.fefe.de/?ts=99ac21ad (in German). In the post itself he is
Following https://openai.com/gptbot leads to an overview with further user-agents. |
The text was updated successfully, but these errors were encountered: