-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ENH] Implement K Means #3159
[ENH] Implement K Means #3159
Conversation
Reviewer ChecklistPlease leverage this checklist to ensure your code review is thorough before approving Testing, Bugs, Errors, Logs, Documentation
System Compatibility
Quality
|
Please tag your PR title with one of: [ENH | BUG | DOC | TST | BLD | PERF | TYP | CLN | CHORE]. See https://docs.trychroma.com/contributing#contributing-code-and-ideas |
|
||
struct KMeansAlgorithmInput<'referred_data> { | ||
indices: Vec<u32>, | ||
embeddings: &'referred_data [f32], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit - might be nice to have a wrapper over flat embedding / embedding dimension
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
need to look at this wholistically since there are a few other places in other PRs also that have flat embeddings. Cut out a task in tracker for this.
rust/index/src/spann/utils.rs
Outdated
let mut centers = vec![vec![0.0; self.input.embedding_dimension]; self.input.k]; | ||
for center in centers.iter_mut() { | ||
let random_center = rand::thread_rng().gen_range(self.input.first..batch_end); | ||
center.copy_from_slice( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we have to copy? can we use index like we do for input?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, we can for center init but then it needs some custom code that can't be reused as it is done currently. In the interest of time, I'll add a TODO and revisit later if performance becomes a concern
rust/index/src/spann/utils.rs
Outdated
previous_counts: &[usize], | ||
) -> (i32, f32) { | ||
let point_idx = self.input.indices[idx]; | ||
let dim = self.input.embedding_dimension; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we had helper structs for the embedding / dimension this would be made cleaner
02094cc
to
9744dbf
Compare
7d09055
to
d173a6c
Compare
9744dbf
to
d04db2f
Compare
4bcb63e
to
13b81e0
Compare
f722f02
to
71771cf
Compare
13b81e0
to
d3bf83a
Compare
71771cf
to
97dcbde
Compare
d3bf83a
to
1497e1b
Compare
dc505fe
to
fc6a644
Compare
1497e1b
to
92154e3
Compare
fc6a644
to
0df312a
Compare
92154e3
to
e131daa
Compare
e131daa
to
4a761d6
Compare
4a761d6
to
d72dd6f
Compare
Description of changes
Summarize the changes made by this PR.
Test plan
pytest
for python,yarn test
for js,cargo test
for rustDocumentation Changes
None