Replies: 1 comment 9 replies
-
My tests with Selenium grid revealed, that SC cannot handle that correctly atm. It blocks with the single instance URL used in the related bolt. There is an open issue for that. It might be faster (atm) to specify multiple standalone selenium instances. |
Beta Was this translation helpful? Give feedback.
9 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
HI Julien,
We are trying to speed up our crawling speed. Currently our crawling lack and the speed is around 1.5 urls per second and I want to increase it to 9 urls persec.
I have deployed storm crawler over kubernetes and using a node pool of 4 nodes of 3.92 cores each and memory of 29.72 gb each.
I am running 8 replicas supervisors with 1 worker in each supervisor
We are using selenium grid for crawling and running 100 chrome replicas over kubernetes.
We are using storm-crawler 2.7 and apache storm version 2.4
I need help to achieve the above mentioned speed.
Can you guide me into the direction how can I achieve this I will upload all the config documents required for you to analyse.
crawler-conf.yaml
storm.yaml
es-conf.yaml
topology.yaml
Beta Was this translation helpful? Give feedback.
All reactions