Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

distanceChanged is always True? #2

Open
tombenj opened this issue Feb 5, 2020 · 3 comments
Open

distanceChanged is always True? #2

tombenj opened this issue Feb 5, 2020 · 3 comments

Comments

@tombenj
Copy link
Contributor

tombenj commented Feb 5, 2020

I'm running the latest commit and distanceChanged in the below line in Kmeans.cs never changes to False since totalDistance will never equal lastDistance. There's going to be a subtle difference. Am I wrong here or this should be the case that it continues to lookout even after small improvements such as 3.0523720386810282E-06?

// check stopping criteria
if (totalDistance == lastDistance)
{
    distanceChanged = false;
} 

Example of output:

Saving intermediate table to file...
Saving to file OCHSRiverClusters_temp.txt...
Current average distance: 15.79374446697906 Improvement: 1.0757505357616992E-06, 6.811243935533895E-06%
Saving intermediate table to file...
Saving to file OCHSRiverClusters_temp.txt...
Current average distance: 15.793745057814318 Improvement: -5.908352580519249E-07, -3.740944776176036E-06%
Saving intermediate table to file...
Saving to file OCHSRiverClusters_temp.txt...
Current average distance: 15.793746261186945 Improvement: -1.2033726264348843E-06, -7.61929879189438E-06%
Saving intermediate table to file...
Saving to file OCHSRiverClusters_temp.txt...
Current average distance: 15.793746167227063 Improvement: 9.395988165294966E-08, 5.949182702025269E-07%
Saving intermediate table to file...
Saving to file OCHSRiverClusters_temp.txt...
Current average distance: 15.79374500671032 Improvement: 1.1605167422601426E-06, 7.3479510809271176E-06%
Saving intermediate table to file...
Saving to file OCHSRiverClusters_temp.txt...
Current average distance: 15.793743478318987 Improvement: 1.528391333494028E-06, 9.677193935075934E-06%
Saving intermediate table to file...
Saving to file OCHSRiverClusters_temp.txt...
Current average distance: 15.793740306476968 Improvement: 3.1718420192561325E-06, 2.008290196364726E-05%
Saving intermediate table to file...
Saving to file OCHSRiverClusters_temp.txt...
Current average distance: 15.793738522229306 Improvement: 1.7842476616181102E-06, 1.1297182467284728E-05%
Saving intermediate table to file...
Saving to file OCHSRiverClusters_temp.txt...
Current average distance: 15.793739002160862 Improvement: -4.799315558301487E-07, -3.0387457439218224E-06%
Saving intermediate table to file...
Saving to file OCHSRiverClusters_temp.txt...
Current average distance: 15.793735319061417 Improvement: 3.68309944498435E-06, 2.331999690019515E-05%
Saving intermediate table to file...
Saving to file OCHSRiverClusters_temp.txt...
Current average distance: 15.793732525404456 Improvement: 2.7936569608755235E-06, 1.7688386588776694E-05%
Saving intermediate table to file...
Saving to file OCHSRiverClusters_temp.txt...
Current average distance: 15.793729282770302 Improvement: 3.2426341540769954E-06, 2.053114518396626E-05%
Saving intermediate table to file...
Saving to file OCHSRiverClusters_temp.txt...
Current average distance: 15.793726230398264 Improvement: 3.0523720386810282E-06, 1.9326480682479996E-05%
Saving intermediate table to file...
Saving to file OCHSRiverClusters_temp.txt..
@ozzi7
Copy link
Owner

ozzi7 commented Feb 5, 2020

Hello @tombenj. I am aware of that issue, there is no guarantee for convergence. Because of that and because k-means in C# is very slow I decided to simply save the table on every iteration to a file and let the user exit. On the next run the file is loaded (if it is properly renamed) and k-means for that table is skipped and assumed to be finished. What should be implemented instead is a correct stopping criteria. For example: 1) The point-to-cluster assignment vector doesn't change or 2) Maximum number of iterations reached. Feel free to implement it and open a pull request.

@IamHuskar
Copy link

Hello @tombenj. I am aware of that issue, there is no guarantee for convergence. Because of that and because k-means in C# is very slow I decided to simply save the table on every iteration to a file and let the user exit. On the next run the file is loaded (if it is properly renamed) and k-means for that table is skipped and assumed to be finished. What should be implemented instead is a correct stopping criteria. For example: 1) The point-to-cluster assignment vector doesn't change or 2) Maximum number of iterations reached. Feel free to implement it and open a pull request.

hello ~ where is the cpp version?

@Strugur
Copy link

Strugur commented Feb 3, 2022

Hello @tombenj. I am aware of that issue, there is no guarantee for convergence. Because of that and because k-means in C# is very slow I decided to simply save the table on every iteration to a file and let the user exit. On the next run the file is loaded (if it is properly renamed) and k-means for that table is skipped and assumed to be finished. What should be implemented instead is a correct stopping criteria. For example: 1) The point-to-cluster assignment vector doesn't change or 2) Maximum number of iterations reached. Feel free to implement it and open a pull request.
Hi, some advice to your project, histograms you can store in 1D array it save you for riverHistograms around 1GB and speed up computation in kmeans.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants