Issues with groupby columns #67

michaelbornholdt · 2021-10-15T19:00:22Z

Soo,

For prec_recall and Hitk we know have the case that the input groupby columns determines by which columns the similarity df is sorted. This has a important impact on your solution. If you for example sort by something that is not unique, ie not unique in the input df - then you will get internal connections in the sub dataframe that you are grouping.

Lets say you have for example a df with Sampels and different dosages. If you then have groupby_columns = Metadata_broad_sample, then you will sort into sub groups that have several connections within each other (all the different doses). And your precision will have the weird effects that @FloHu described in #62 for example. Similarly, hitk will have weird results because you are now looking at internal connections and not only the nearest neighbors of one sample.

Either we keep it all this way and make users aware of this or we find some workaround here? Maybe the solution is to not allow anything other than unique groupby_cols ?

michaelbornholdt · 2021-10-15T19:01:19Z

This is very hard to explain to someone who is not deep in the matter...

gwaybio · 2021-10-20T13:34:49Z

This to me seems like an important problem for us to solve

michaelbornholdt · 2021-10-20T13:41:50Z

Will be solved after the 11/12 , after my thesis

michaelbornholdt · 2021-11-22T19:04:40Z

@shntnu The text above gives an example.
This should be discussed in the context of the architecture overhaul of Cyto eval.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issues with groupby columns #67

Issues with groupby columns #67

michaelbornholdt commented Oct 15, 2021

michaelbornholdt commented Oct 15, 2021

gwaybio commented Oct 20, 2021

michaelbornholdt commented Oct 20, 2021

michaelbornholdt commented Nov 22, 2021

Issues with groupby columns #67

Issues with groupby columns #67

Comments

michaelbornholdt commented Oct 15, 2021

michaelbornholdt commented Oct 15, 2021

gwaybio commented Oct 20, 2021

michaelbornholdt commented Oct 20, 2021

michaelbornholdt commented Nov 22, 2021