-
Notifications
You must be signed in to change notification settings - Fork 2
2.1 Architecture and Flow
-
Crawling to find onion urls and expand the onions initial seed set
1.1. The crawler reads from a text file a couple of urls, regular web, to start the crawl from. url like darknet wiki, reddit, some google search etc.
1.2. The crawler starts crawling until it hits
MAX_CRAWL
crawls andMAX_DEPTH
which is how deep to crawl, both are configurable numbers that to prevent the crawler to run forever.1.3. When the crawler task has finished the crawler will write its results to a text file filled with onion urls.
-
Preforming full scans to build the Darknet's Network graph and to expand the onions initial seed set as well
2.1. The crawler reads from a text file a list of tens of thousands of onion urls and insert it to its own
crawling queue
.2.2. The crawler crawls until its crawling queue is empty, every time it encounters a new url it will put it in the queue - a full scan of about 20-25k onions takes on average 11 hours.
2.3. In parallel of the crawling the crawler insert records of its results to a graph db, where the vertices are the onions and edges are links between one and other - more about this in the Components Section.
-
Preforming high frequency scans to estimate the life span of a set of known available onions
3.1. The crawler randomly select 1000 onions out of the onions that were found active in the last full scan.
3.2. The crawler crawls until its crawling queue is empty, every time it encounters a new url it will put it in the queue - a high frequency scan that start with 1000 active onions takes on average 13 minutes.
3.3. In parallel of the crawling the crawler insert records of its results to a graph db.
-
Statistic scripts are analysing the results of both type of onion crawls, querying the graph db and exporting some results - more on this on Statistic Section