SERP based keyword clustering

A keyword clustering tool (...) compares the TOP 10 search result listings that showed up for the taken keyword to the TOP10 search results that showed up for another keyword to detect the number of matching URLs. At the same time, a tool compares all keywords to each other and all matching URLs in the detected pairs. If the detected number of identical search listings matches the selected grouping level, the keywords are grouped together.

As the result, all keywords within a group will be related to each other by having the same matching URLs.(From https://en.wikipedia.org/wiki/Keyword_clustering#Hard)

Input data format

A csv file containing

keyword1, url1

keyword1, url2

keyword2, url3

keyword2, url2 ...

How to run

run "sh install_neo4j.sh"

run "sh create_database.sh"

To cluster an input search query run "python keyword_cluster.py "search query" -k 3 -u 4" where 3 is the grouping level, and 4 is the minimum number of matching URLs.

To cluster all keywords, run "python find_all_groups.py -k 3 -u 4" This creates a csv file output.csv, formatted as

keyword1 supporting_keyword1,supporting_keyword2,...

keyword2 supporting_keyword3,supporting_keyword4,...

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
keywords.csv		keywords.csv
url2keyword.csv		url2keyword.csv
urls.csv		urls.csv
Pyspark Keyword Clustering.ipynb		Pyspark Keyword Clustering.ipynb
Pyspark Make CVSs.ipynb		Pyspark Make CVSs.ipynb
README.md		README.md
Search_for_keyword_cluster-Greedy.ipynb		Search_for_keyword_cluster-Greedy.ipynb
Search_for_keyword_cluster-bad_scaling.ipynb		Search_for_keyword_cluster-bad_scaling.ipynb
Search_for_keyword_cluster.ipynb		Search_for_keyword_cluster.ipynb
build_graph.py		build_graph.py
create_database.sh		create_database.sh
delete_database.sh		delete_database.sh
edit_conf.py		edit_conf.py
find_all_groups.py		find_all_groups.py
install_neo4j.sh		install_neo4j.sh
keyword_cluster.py		keyword_cluster.py
neo4j_setup		neo4j_setup
neoquery.py		neoquery.py
setup_database.cql		setup_database.cql

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SERP based keyword clustering

Input data format

How to run

About

Releases

Packages

Languages

deoxyribose/keyword_clustering

Folders and files

Latest commit

History

Repository files navigation

SERP based keyword clustering

Input data format

How to run

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages