Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Potential bots #353

Open
19 of 64 tasks
JayBizzle opened this issue Jan 21, 2020 · 12 comments
Open
19 of 64 tasks

Potential bots #353

JayBizzle opened this issue Jan 21, 2020 · 12 comments

Comments

@JayBizzle
Copy link
Owner

JayBizzle commented Jan 21, 2020

  • Filestack
  • Google-Ads-Overview Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.118 Safari/537.36
  • Google-Ads-Overview Mozilla/5.0 (Linux; U; Android 6.0.1; generic) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.118 Version/4.0 Mobile Safari/537.36
  • Google-Ads-Overview Mozilla/5.0 (Linux; U; Android 2.3.4; generic) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.118 Version/4.0 Mobile Safari/537.36
  • Google-Ads-Overview Mozilla/5.0 (Linux; U; Android 2.3.4; generic) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Mobile Safari/537.36
  • Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/602.1 (KHTML, like Gecko) splash Version/9.0 Safari/602.1
  • adreview/1.0
  • Mozilla/5.0 (compatible; RyowlEngine/1.0; +https://ryowl.org)
  • Mozilla/5.0 (compatible; RyowlEngine/1.0; +https://ryowl.com)
  • Google-speakr
  • Google-speakr,gzip(gfe)
  • FeedViewer/1.0 (+http://www.feedviewer.net/webmasters; license agreement: http://www.feedviewer.net/license)
  • acebookexternalhit/1.0 (+http://www.facebook.com/externalhit_uatext.php)
  • WhoAPI/1.0 (whoapi.com)
  • Mozilla/5.0 (compatible; BackupLand/1.0; https://go.backupland.com/; Domain check for viruses;)
  • Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:66.0) WhatCMS/1.0
  • Google-Ads-Overview Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.118 Safari/537.36
  • Google-Ads-Overview Mozilla/5.0 (Linux; U; Android 6.0.1; generic) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.118 Version/4.0 Mobile Safari/537.36
  • Google-Ads-Overview Mozilla/5.0 (Linux; U; Android 2.3.4; generic) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.118 Version/4.0 Mobile Safari/537.36
  • Google-Ads-Overview Mozilla/5.0 (Linux; U; Android 2.3.4; generic) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Mobile Safari/537.36
  • Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) DownloaderChrome/62.0.3202.75 Safari/537.36
  • iGooglePortal
  • Mozilla/5.0+(compatible; Cula/2.0; https://cula.io/)
  • Mozilla/5.0 (Windows; U; Windows NT 6.1; en-us; rv:1.9.2.3) Gecko/20100401 YFF35 Firefox/3.6.3
  • Owlin - http://www.owlin.com
  • Mozilla/5.0 (compatible; +centuryb.o.t9[at]gmail.com)
  • Bublup (+https://www.bublup.com/bublup.html)
  • Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.121 Safari/537.36 | Hexometer.com - HexAct Inc.
  • Mozilla/5.0/Firefox/42.0 - nbertaupete95(at)gmail.com
  • OpenGraphCheck/2.1 (+https://opengraphcheck.com)
  • donwload_html/2.0 (Linux) [email protected]
  • LinuxGetURL/2.0 [email protected] (Linux)
  • Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.96 Mobile Safari/537.36 (compatible; Google-AMPHTML)
  • Google-AMPHTML
  • inactive-blog-skipper/1.0 ([email protected])
  • AWS Network Health / Contact [email protected] with your website URL to stop
  • AWS Network Health / Contact [email protected] with your website URL to stop
  • Corax - [email protected]
  • draw.io
  • MindsMediaProxy/3.0 (+http://www.minds.com/)
  • Mozilla/5.0 (w3dt header analysis for httprecon tools; http://w3dt.net/tools/httprecon)
  • Google-Test
  • Mozilla/5.0 (compatible; Google-Test;)
  • Mozilla/5.0 (compatible; RSiteAuditor)
  • Mozilla/5.0 (compatible; WPSec/1.3; +https://wpsec.com)
  • Mozilla/5.0 (compatible; Go-KI; +https://www.gosign.de/)
  • Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.96 Mobile Safari/537.36 (compatible; Google-AMPHTML)
  • Google-AMPHTML
  • Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome-prerendercloud/66.0.3359.139 Safari/537.36
  • DIGMATO.com web tester
  • Mozilla/5.0 (X11; Linux x86_64; Rigor) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3809.87 Safari/537.36
  • Mozilla/5.0 Windows NT 10.0; Win64; x64 AppleWebKit/537.36 KHTML, like Gecko Chrome/65.0.3286.0 Safari/537.36 Rigor
  • Mozilla/5.0 (X11; Linux x86_64; rv:61.0) Gecko/20100101 Firefox/61.0 (Research project: Visit PrivacyScore.org for details)
  • veu/1.0 (+http://www.veu.cat)
  • Google-Cloud-ML-Vision
  • FirmoGraph (+https://firmograph.io)
  • Mozilla/5.0 (compatible; 2GDPR/1.2; https://2gdpr.com)
  • CityGridMedia/1.0 (compatible; http://url-validation.citygrid.com/)
  • Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_4) AppleWebKit/537.11 (KHTML, like Gecko)(compatible; http://url-validation.citygrid.com/) Chrome/23.0.1271.95 Safari/537.11
  • https://gdnplus.com:Gather Analyze Provide.
  • northcutt.com SEO tools
  • Burf.co
  • Mozilla/5.0 (compatible; WPSec/1.3; +https://wpsec.com)
  • gensun.org
@Abhirup-99
Copy link

Is this merged?

@JayBizzle
Copy link
Owner Author

Is this merged?

The user-agents marked with ✅ have been added, the others need adding 👍🏻

@newHagen
Copy link

newHagen commented Jul 1, 2020

This is the UserAgent of the Google-Weblight bot:

  • Mozilla/5.0 (Linux; Android 4.2.1; en-us; Nexus 5 Build/JOP40D) AppleWebKit/535.19 (KHTML, like Gecko; googleweblight) Chrome/38.0.1025.166 Mobile Safari/535.19
    Should be detectable by "googleweblight"

@clementmas
Copy link
Contributor

There's also:

Mozilla/5.0 AppleWebKit/537.36 Chrome/114.0.5735.179 Safari/537.36 Google-Ads-Conversions

Should these 2 existing rules be replaced:

  • Google-Ads-Creatives-Assistant
  • Google-Ads-Overview

with a simple "Google-Ads" detection?

@JayBizzle
Copy link
Owner Author

There's also:

Mozilla/5.0 AppleWebKit/537.36 Chrome/114.0.5735.179 Safari/537.36 Google-Ads-Conversions

Should these 2 existing rules be replaced:

  • Google-Ads-Creatives-Assistant
  • Google-Ads-Overview

with a simple "Google-Ads" detection?

Yeah, go for it 👍

@SoranDK
Copy link

SoranDK commented Jul 24, 2023

Probably no way to detect but these 2 visit my entirely Danish site every day... The first twice a day from the US and the second once a day from China. These are all the useragent headers and all of it seems to be removed via excludes.

Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.97 Safari/537.36

Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.142 Safari/537.36

@JayBizzle
Copy link
Owner Author

JayBizzle commented Jul 26, 2023

Probably no way to detect but these 2 visit my entirely Danish site every day... The first twice a day from the US and the second once a day from China. These are all the useragent headers and all of it seems to be removed via excludes.

Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.97 Safari/537.36

Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.142 Safari/537.36

Yep, pretty annoying bots like this. Nothing this package can do about that 🤔

@SoranDK
Copy link

SoranDK commented Aug 17, 2023

I found this list if anyone's interested in going through it ;-P https://user-agents.net/bots

I don't have enough experience with regex to do it myself sadly... As my original post showed (hadn't noticed the bot I mentioned already would get catched by the "bot" in the regex).

@tsawitzki
Copy link

what about the new generation of AI / Machine Learning crawlers/bots like GPTbot? I don't see them listed here, however the demo detection site recognizes it?

@JayBizzle
Copy link
Owner Author

what about the new generation of AI / Machine Learning crawlers/bots like GPTbot? I don't see them listed here, however the demo detection site recognizes it?

Can you give me some example user agents

@MIJmker
Copy link

MIJmker commented Jul 2, 2024

I found this repo after running into some issues caused by crawlers, but saw the following not in the list. So since you asked for some examples, here are some of them which crashed our sites :/

@k00ni
Copy link

k00ni commented Dec 6, 2024

what about the new generation of AI / Machine Learning crawlers/bots like GPTbot? I don't see them listed here, however the demo detection site recognizes it?

Can you give me some example user agents

@JayBizzle I found a blog post about GPTbot here: https://blog.fefe.de/?ts=99ac21ad (in German). In the post itself he is
basically talking about the bot and its 11k hits on his blog in one day. He posted the following user-agent:

Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; GPTBot/1.2; +https://openai.com/gptbot)

Following https://openai.com/gptbot leads to an overview with further user-agents.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants