PDSC

About

PDSC is a C++ stream clustering benchmark for educational purposes

Build Sesame

Prerequisites

gcc 11 or lator
cmake 3.17 or later

Checkout Source Code

git clone https://github.com/intellistream/PDSC
cd PDSC

Build

mkdir build
cd build
cmake .. -DCMAKE_BUILD_TYPE=Release
make -j$(nproc)

Run Benchmark

Download datasets from Releases:

wget https://github.com/intellistream/PDSC/releases/download/dataset/covtype.csv

Run the benchmark:

./pdsc /path/to/{dataset}.csv [-n num_points]

Datasets

DataSet	Length	Dimensions	Cluster Number
CoverType	581012	54	7

Baseline Performance

Algorithm	Execution Time	Purity (correctness check)	Num of Clusters (correctness check)
BIRCH	1081 ms	0.0817814	57
CluStream	1001 ms	0.717314	7
EDMStream	181 ms	0.893519	14
DStream	778 ms	0.770046	101
DenStream	1045 ms	0.374782	15
SL-KMeans	3082507 ms	0.171415	7

Baseline performance is tested on CoverType dataset with 581012 points.

CPU: AMD Ryzen 7 5700U
RAM: 24GB DDR4 3200MHz

Reference

[SIGMOD 2023] Xin Wang and Zhengru Wang and Zhenyu Wu and Shuhao Zhang and Xuanhua Shi and Li Lu. Data Stream Clustering: An In-depth Empirical Study, SIGMOD, 2023

@inproceedings{wang2023sesame,
	title        = {Data Stream Clustering: An In-depth Empirical Study},
	author       = {Xin Wang and Zhengru Wang and Zhenyu Wu and Shuhao Zhang and Xuanhua Shi and Li Lu},
	year         = 2023,
	booktitle    = {Proceedings of the 2023 International Conference on Management of Data (SIGMOD)},
	location     = {Seattle, WA, USA},
	publisher    = {Association for Computing Machinery},
	address      = {New York, NY, USA},
	series       = {SIGMOD '23},
	abbr         = {SIGMOD},
	bibtex_show  = {true},
	selected     = {true},
	pdf          = {papers/Sesame.pdf},
	code         = {https://github.com/intellistream/Sesame},
	doi	         = {10.1145/3589307},
    url          = {https://doi.org/10.1145/3589307}
}

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
.gitignore		.gitignore
CMakeLists.txt		CMakeLists.txt
LICENSE		LICENSE
README.md		README.md
algorithm.hpp		algorithm.hpp
birch.hpp		birch.hpp
clustream.hpp		clustream.hpp
common.hpp		common.hpp
denstream.hpp		denstream.hpp
dstream.hpp		dstream.hpp
edmstream.hpp		edmstream.hpp
evaluation.hpp		evaluation.hpp
main.cpp		main.cpp
point.cpp		point.cpp
point.hpp		point.hpp
slkmeans.hpp		slkmeans.hpp

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PDSC

About

Build Sesame

Prerequisites

Checkout Source Code

Build

Run Benchmark

Datasets

Baseline Performance

Reference

About

Releases 1

Packages

Languages

License

intellistream/PDSC

Folders and files

Latest commit

History

Repository files navigation

PDSC

About

Build Sesame

Prerequisites

Checkout Source Code

Build

Run Benchmark

Datasets

Baseline Performance

Reference

About

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages