Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Intake lists #52

Merged
merged 27 commits into from
May 6, 2021
Merged

Intake lists #52

merged 27 commits into from
May 6, 2021

Conversation

michaelbornholdt
Copy link
Collaborator

Enrichment and Precision recall shall intake a list of variables. This way the similarity matrix is only computed once!

@michaelbornholdt michaelbornholdt changed the title Intake lists Draft: Intake lists Apr 26, 2021
@codecov-commenter
Copy link

codecov-commenter commented Apr 27, 2021

Codecov Report

Merging #52 (5fb4d24) into master (220b296) will decrease coverage by 0.09%.
The diff coverage is 97.29%.

❗ Current head 5fb4d24 differs from pull request most recent head 04be210. Consider uploading reports for the commit 04be210 to get more accurate results
Impacted file tree graph

@@            Coverage Diff             @@
##           master      #52      +/-   ##
==========================================
- Coverage   98.36%   98.26%   -0.10%     
==========================================
  Files          24       24              
  Lines         855      865      +10     
==========================================
+ Hits          841      850       +9     
- Misses         14       15       +1     
Flag Coverage Δ
unittests 98.26% <97.29%> (-0.10%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
cytominer_eval/evaluate.py 100.00% <ø> (ø)
cytominer_eval/tests/test_evaluate.py 100.00% <ø> (ø)
cytominer_eval/operations/enrichment.py 95.45% <93.33%> (-4.55%) ⬇️
cytominer_eval/operations/precision_recall.py 100.00% <100.00%> (ø)
...iner_eval/tests/test_operations/test_enrichment.py 100.00% <100.00%> (ø)
...val/tests/test_operations/test_precision_recall.py 100.00% <100.00%> (ø)
cytominer_eval/transform/util.py 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 220b296...04be210. Read the comment docs.

@michaelbornholdt
Copy link
Collaborator Author

@gwaygenomics This last Commit should fix every thing. So this is ready for review :)

@michaelbornholdt michaelbornholdt changed the title Draft: Intake lists Intake lists Apr 27, 2021
@michaelbornholdt
Copy link
Collaborator Author

Fixes #51

@michaelbornholdt
Copy link
Collaborator Author

Ah, I still need to run the Demo notebook

@michaelbornholdt
Copy link
Collaborator Author

@gwaygenomics I reran the demo, All up to do date now

@gwaybio gwaybio self-requested a review April 28, 2021 13:47
Copy link
Member

@gwaybio gwaybio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice work. One main discussion point: Do we want to force the user to input a list, or should we allow single elements as well?

A supplementary point is to please add back the Cell Painting demo notebook (I think you accidentally deleted it)

cytominer_eval/evaluate.py Outdated Show resolved Hide resolved
cytominer_eval/evaluate.py Show resolved Hide resolved
grit_control_perts: List[str] = ["None"],
grit_replicate_summary_method: str = "mean",
mp_value_params: dict = {},
enrichment_percentile: float = 0.5,
enrichment_percentile: List[float] = [0.99, 0.98],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment about accepting both single values and lists - do you agree?

Also, nice to see that you're suggesting a more reasonable default after your experimentation!

@@ -129,7 +130,7 @@ def evaluate(
metric_result = precision_recall(
similarity_melted_df=similarity_melted_df,
replicate_groups=replicate_groups,
k=precision_recall_k,
k_list=precision_recall_k,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will need to revisit this decision pending discussion on accepting both ints and list of ints.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

) -> dict:
similarity_melted_df: pd.DataFrame,
replicate_groups: List[str],
percentile: List[float],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just making sure that you're intentionally removing the default of 0.9.

Also, this will need to change depending on how we decide users should interact (single int or list of ints)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -28,48 +30,54 @@ def enrichment(
replicate_groups : List
a list of metadata column names in the original profile dataframe to use as
replicate columns.
percentile : float
enrichment_percentile : List of floats
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment about might need to change.

also, shouldn't this just be percentile instead of enrichment_percentile? If not, then u need to change the function argument name

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

enrichment_percentile is still not the name of the function argument (line 19 above) do you see what I mean?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thinks it fine how it is now. Do you think I should change the name of the variable in enrichment.py to enrichment_percentile instead of just percentile?

cytominer_eval/operations/enrichment.py Show resolved Hide resolved
similarity_melted_df: pd.DataFrame,
replicate_groups: List[str],
k: int,
similarity_melted_df: pd.DataFrame, replicate_groups: List[str], k_list: List[int],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment about int vs. list[int] decisions.

Also, I LOVE the fact that you're making this enhancement all in one shot, and not just to the enrichment metric. So useful to have all these changes in one PR.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

assert result_df.shape == (7, 4)
assert result_df.percentile[0] == 1.0
assert result.shape == (7, 4)
assert result.enrichment_percentile[0] == 1.0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a check to the second element in this Series?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure what you mean

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assert that result.enrichment_percentile[1] equals what you'd calculate by hand.

@michaelbornholdt
Copy link
Collaborator Author

@gwaygenomics should be done with all your comments now

@gwaybio gwaybio self-requested a review April 30, 2021 19:24
Copy link
Member

@gwaybio gwaybio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

only minor comments, but they should be addressed before merge.

cytominer_eval/evaluate.py Outdated Show resolved Hide resolved
cytominer_eval/operations/enrichment.py Outdated Show resolved Hide resolved
@@ -28,48 +30,54 @@ def enrichment(
replicate_groups : List
a list of metadata column names in the original profile dataframe to use as
replicate columns.
percentile : float
enrichment_percentile : List of floats
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

enrichment_percentile is still not the name of the function argument (line 19 above) do you see what I mean?

@@ -160,7 +156,7 @@ def test_evaluate_precision_recall():
replicate_groups=["Metadata_broad_sample"],
operation="precision_recall",
similarity_metric="pearson",
precision_recall_k=k,
precision_recall_k=[k],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you try one of these without a list? (i mean one of either lines 144 or 163)

It should work either way

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a quick comment on line 151 noting the difference between evaluate on line 152 and evaluate on line 137?

For future changes to this code, it will be important to note that we're also testing the list/int capability of this argument

assert result_df.shape == (7, 4)
assert result_df.percentile[0] == 1.0
assert result.shape == (7, 4)
assert result.enrichment_percentile[0] == 1.0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assert that result.enrichment_percentile[1] equals what you'd calculate by hand.

@@ -42,11 +42,11 @@ def test_precision_recall():
result = precision_recall(
similarity_melted_df=similarity_melted_df,
replicate_groups=replicate_groups,
k=10,
k=[5, 10],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you also keep a test for k as an int?

This test is good, but maybe add a test asserting that the outputs are the same for k=10 and k=[10]

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@michaelbornholdt
Copy link
Collaborator Author

@gwaygenomics ready for you to look at again :)

Copy link
Member

@gwaybio gwaybio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very minor comments, I'll merge after they are addressed!

cytominer_eval/evaluate.py Outdated Show resolved Hide resolved
@@ -160,7 +156,7 @@ def test_evaluate_precision_recall():
replicate_groups=["Metadata_broad_sample"],
operation="precision_recall",
similarity_metric="pearson",
precision_recall_k=k,
precision_recall_k=[k],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a quick comment on line 151 noting the difference between evaluate on line 152 and evaluate on line 137?

For future changes to this code, it will be important to note that we're also testing the list/int capability of this argument

@michaelbornholdt
Copy link
Collaborator Author

@gwaygenomics back to you again :)

@michaelbornholdt
Copy link
Collaborator Author

What is this change request, how do I get rid of it?

@gwaybio
Copy link
Member

gwaybio commented May 6, 2021

No worries, I get rid of it when I finally approve. Looking now

@gwaybio
Copy link
Member

gwaybio commented May 6, 2021

Wonderfully done - merging now

@gwaybio gwaybio merged commit 7f94b11 into cytomining:master May 6, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants