Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Community Detection doesn't work on large dataset #1840

Closed
chintan-ushur opened this issue Feb 22, 2023 · 1 comment · Fixed by #2381
Closed

Community Detection doesn't work on large dataset #1840

chintan-ushur opened this issue Feb 22, 2023 · 1 comment · Fixed by #2381

Comments

@chintan-ushur
Copy link

chintan-ushur commented Feb 22, 2023

I have a embedding array of size 1,000,000. Each embedding of dimension 386.

When I give the embedding array to community_detection, it keeps running for days and then fails.

VM Configuration:
40 GB GPU Ram
84 GB System RAM

Peak Utilisation:
GPU Memory - 17%
GPU - 11%

CPU Memory - 12%

Other observations:
Say, the name of the variable holding embeddings array is embeds

If I reduce the size of embeddings array to 1/10th, i.e., if I give 100,000 instead of 1,000,000 (embeds[:100000]), within 2 mins the execution is completed successfully.
BUT, instead of giving first 100,000 samples, if I give 100,000 from between, it takes unusually long (embeds[200000:300000], takes way more time)

@tomaarsen
Copy link
Collaborator

Hello!

#2381 should improve the efficiency of calling community_detection on GPU. It will be included in the next release.

  • Tom Aarsen

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants