Using Selenium for loading dynamic content #1092
-
Hello all, I've been trying to get SC to use the Selenium RemoteDriver HTTP protocol but I can't seem to make it work. The configuration I'm using right now is as follows:
Articles / documents I've referred to -
What I've managed to do so far:1) SC + Browserless on local topology (Followed the first blog post to the "T")
With some additions to the The config is as follows:
NOTE: Both Initiate the crawl:
Which gives me the following output:
From the above output, it's clear that the topology is working as expected. Time to use the latest version. 2) SC + Browserless on local topology (Upgrade SC version)
With SC v2.9.0, the Selenium library is upgraded from The following error is encountered:
Looking at the error logs and reading up on Selenium 4, it seems as though the
Am I correct in the above assumption? Or is there some step that I'm missing which can make SC work with Selenium 4? |
Beta Was this translation helpful? Give feedback.
Replies: 6 comments 16 replies
-
Thanks for the explanation @dhaneshsabane, the tutorial in the blog is indeed based on an old version. |
Beta Was this translation helpful? Give feedback.
-
For adding tests, we could have the tests spinning a container just like we do e.g. for OpenSearch. |
Beta Was this translation helpful? Give feedback.
-
Have just created a branch seleniumTest; not working yet but fixes some of the dependencies |
Beta Was this translation helpful? Give feedback.
-
Thanks for the starting point @jnioche. I was walking on a similar path with the changes. To test this locally, can you share the steps I can take to build the SC core jar locally which can then be used in my selenium-tutorial project? I'm more of a Python dev, so I'm having to figure out the tooling around Java to make it work. |
Beta Was this translation helpful? Give feedback.
-
mvn clean install and remember to change the version of sc-core in your project, will be 2.10-SNAPSHOT I expect
are obtained from crawler-default.yml. For some reason the values are not overridden by the user config. I will remove them from the default. What you will need to have in selenium.capabilities is:
There is also a line to remove from RemoteDriverProtocol.java So quite a few changes needed in the code, which will be done as part of the PR adding the test. I might release a new patch version soon. |
Beta Was this translation helpful? Give feedback.
-
Meanwhile, I guess the guava dependency used in storm-hdfs needs an upgrade too -
|
Beta Was this translation helpful? Give feedback.
I created a simple crawler with the archetype. Build Julien's branch locally and upgraded to latest Selenium.
Next, started the headless chrome docker container and configured my crawler-conf.yaml
To workaround the netty issue, I upgraded the shade plugin to 3.5.0 (in the archetype) and added relocations: