-
Notifications
You must be signed in to change notification settings - Fork 7
/
crawl-test-site.robot
58 lines (46 loc) · 2 KB
/
crawl-test-site.robot
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
*** Settings ***
Documentation Initiate test site crawl and verify
Library Process
Library RequestsLibrary
Library XML
*** Variables ***
${user}= "heritrix"
${pass}= "heritrix"
${url}= https://heritrix:8443/engine/job/frequent
&{headers}= content-type=application/xml accept=application/xml
# TODO add tests that check these against expected values:
# http://localhost:9090/fc?fl=urlkey,original,mimetype,statuscode,digest,length&matchType=prefix&urlkey=uk,org,webarchive,crawl-test-site)/
# http://localhost:9090/fc?fl=urlkey,original,mimetype,statuscode,digest,length&matchType=prefix&urlkey=com,matkelly,acid)/
# Similar tests for urlkey=screenshot: etc. (but maybe don't expect the digest/length to match!)
*** Test Cases ***
Wait For Crawler To Be Ready
Wait Until Keyword Succeeds 3 min 20 sec Check Crawler Status EMPTY
Launch crawl-test-site Crawl
Launch Crawl http://crawl-test-site.webarchive.org.uk
Launch acid-test Crawl
Launch Crawl http://acid.matkelly.com/
Wait For Crawl To Begin
Wait Until Keyword Succeeds 1 min 10 sec Check Crawler Status RUNNING
Wait For Crawl To Finish
Wait Until Keyword Succeeds 5 min 20 sec Check Crawler Status EMPTY
Log Crawl has completed.
*** Keywords ***
Check Crawler Status
[Arguments] ${expected}
Log Checking crawler status...
${user_pass}= Evaluate (${user}, ${pass},)
Create Digest Session h3session ${url} auth=${user_pass} headers=${headers} verify=${false} disable_warnings=1
${resp}= Get On Session h3session ${url} headers=${headers} verify=${false}
# Log the response, to help debugging:
Log ${resp.text}
${status}= Get Element Text ${resp.text} crawlControllerState
Log ${status}
Should Match ${status} ${expected}
Launch Crawl
[Arguments] ${url}
Log Launching crawl of ${url}...
${result}= Run Process submit -k kafka:9092 -S -R fc.tocrawl ${url} shell=yes
Should Not Contain ${result.stderr} Traceback
Log ${result.stdout}
Log ${result.stderr}
Should Be Equal As Integers ${result.rc} 0