Skip to content

Commit

Permalink
Handle DataForSEO bots (#157)
Browse files Browse the repository at this point in the history
DataForSEO is a crawler that is responsible for >50% of bot requests on
a website I manage (>1.3M requests from a single IP address in the past
few months), so handling it with `legitbot` seems like a good idea.

The bot specs are available here: https://dataforseo.com/dataforseo-bot

Let me know if any changes are needed.
  • Loading branch information
gabrieljablonski authored Sep 19, 2024
1 parent fd9ea8a commit 13d53f6
Show file tree
Hide file tree
Showing 5 changed files with 80 additions and 0 deletions.
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,7 @@ end
- [Applebot](https://support.apple.com/en-us/119829)
- [Baidu spider](http://help.baidu.com/question?prod_en=master&class=498&id=1000973)
- [Bingbot](https://blogs.bing.com/webmaster/2012/08/31/how-to-verify-that-bingbot-is-bingbot/)
- [DataForSEO](https://dataforseo.com/dataforseo-bot)
- [DuckDuckGo bot](https://duckduckgo.com/duckduckbot)
- [Google crawlers](https://support.google.com/webmasters/answer/1061943)
- [IAS](https://integralads.com/ias-privacy-data-management/policies/site-indexing-policy/)
Expand Down
1 change: 1 addition & 0 deletions lib/legitbot.rb
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@
require_relative 'legitbot/apple'
require_relative 'legitbot/baidu'
require_relative 'legitbot/bing'
require_relative 'legitbot/dataforseo'
require_relative 'legitbot/duckduckgo'
require_relative 'legitbot/facebook'
require_relative 'legitbot/google'
Expand Down
10 changes: 10 additions & 0 deletions lib/legitbot/dataforseo.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
# frozen_string_literal: true

module Legitbot # :nodoc:
# https://dataforseo.com/dataforseo-bot
class DataForSEO < BotMatch
domains 'dataforseo.com.'
end

rule Legitbot::DataForSEO, %w[DataForSeoBot]
end
60 changes: 60 additions & 0 deletions test/dataforseo_test.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,60 @@
# frozen_string_literal: true

require_relative 'test_helper'

class DataForSEOTest < Minitest::Test
include Minitest::Hooks
include DnsServerMock

def test_malicious_ip
ip = '149.210.164.47'
match = Legitbot::DataForSEO.new ip

refute_predicate match, :valid?
end

def test_valid_ip
ip = '136.243.228.176'
match = Legitbot::DataForSEO.new ip

assert_predicate match, :valid?
end

def test_malicious_ua
bot = Legitbot.bot(
'Mozilla/5.0 (compatible; DataForSeoBot; +https://dataforseo.com/dataforseo-bot)',
'149.210.164.47'
)

assert bot
refute_predicate bot, :valid?
end

def test_valid_ua
bot = Legitbot.bot(
'Mozilla/5.0 (compatible; DataForSeoBot; +https://dataforseo.com/dataforseo-bot)',
'136.243.228.176'
)

assert bot
assert_predicate bot, :valid?
end

def test_valid_name
bot = Legitbot.bot(
'Mozilla/5.0 (compatible; DataForSeoBot; +https://dataforseo.com/dataforseo-bot)',
'136.243.228.176'
)

assert_equal :dataforseo, bot.detected_as
end

def test_fake_name
bot = Legitbot.bot(
'Mozilla/5.0 (compatible; DataForSeoBot; +https://dataforseo.com/dataforseo-bot)',
'81.1.172.108'
)

assert_equal :dataforseo, bot.detected_as
end
end
8 changes: 8 additions & 0 deletions test/lib/dns_server_mock.rb
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,14 @@
ptr: %w[17-58-98-60.applebot.apple.com]
},

# DataForSEO
'crawling-gateway-136-243-228-176.dataforseo.com' => {
a: %w[136.243.228.176]
},
'136.243.228.176' => {
ptr: %w[crawling-gateway-136-243-228-176.dataforseo.com]
},

# Google
'crawl-66-249-64-141.googlebot.com' => {
a: %w[66.249.64.141]
Expand Down

0 comments on commit 13d53f6

Please sign in to comment.