Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issues with groupby columns #67

Open
michaelbornholdt opened this issue Oct 15, 2021 · 4 comments
Open

Issues with groupby columns #67

michaelbornholdt opened this issue Oct 15, 2021 · 4 comments

Comments

@michaelbornholdt
Copy link
Collaborator

Soo,

For prec_recall and Hitk we know have the case that the input groupby columns determines by which columns the similarity df is sorted. This has a important impact on your solution. If you for example sort by something that is not unique, ie not unique in the input df - then you will get internal connections in the sub dataframe that you are grouping.

Lets say you have for example a df with Sampels and different dosages. If you then have groupby_columns = Metadata_broad_sample, then you will sort into sub groups that have several connections within each other (all the different doses). And your precision will have the weird effects that @FloHu described in #62 for example. Similarly, hitk will have weird results because you are now looking at internal connections and not only the nearest neighbors of one sample.

Either we keep it all this way and make users aware of this or we find some workaround here? Maybe the solution is to not allow anything other than unique groupby_cols ?

@michaelbornholdt
Copy link
Collaborator Author

This is very hard to explain to someone who is not deep in the matter...

@gwaybio
Copy link
Member

gwaybio commented Oct 20, 2021

This to me seems like an important problem for us to solve

@michaelbornholdt
Copy link
Collaborator Author

Will be solved after the 11/12 , after my thesis

@michaelbornholdt
Copy link
Collaborator Author

@shntnu The text above gives an example.
This should be discussed in the context of the architecture overhaul of Cyto eval.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants