This is the code for Communication Efficient Distributed Hypergraph Clustering, SIGIR 2021. It has been tested on Ubuntu 16.04.7 LTS and Mac OS Big Sur Version 11.1.
$ pip install numpy, scipy, pandas, h5py, networkx
$ julia
julia> using Pkg
julia> Pkg.add("Laplacians")
julia> Pkg.add("LinearAlgebra")
julia> Pkg.add("MAT")
Replace the ~/.julia/packages/Laplacians/K6Pgk/src/solverInterface.jl with Laplacians/solverInterface.jl in GitHub
Use the precompiled .exe file in ./HyperReplica/HyperReplica/bin/Release/HyperReplica.exe, or compile by yourself with Visual Studio. This is for conductance calculation.
mono is used to call the compiled .exe file for conductance calculation.
For linux:
As described in https://www.mono-project.com/download/stable/#download-lin
sudo apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv-keys 3FA7E0328081BFF6A14DA29AA6A19B38D3D831EF
sudo apt install apt-transport-https ca-certificates
echo "deb https://download.mono-project.com/repo/ubuntu stable-xenial main" | sudo tee /etc/apt/sources.list.d/mono-official-stable.list
sudo apt update
sudo apt install mono-devel
sudo apt install mono-runtime
For Mac:
https://www.mono-project.com/download/stable/#download-mac
Datasets are from 2 resources:
Cornnel and Hypergraph Clustering Based on PageRank in KDD'21
cd code
python pagerank.py --dataset highschool --num_sites 3 --c 5
You can edit the dataset, num_sites and tune parameter c.
The program includes some disk I/O. If the program was down in the middle of the process, some of the generated files may not be complete. Please remove the ./data/DATASET/tmp folder and then re-run the program.
The code for conductance calculation (./HyperReplica) is from Hypergraph_clustering_based_on_PageRank