Case Spreadsheet Download #722

drio18 · 2023-06-13T21:50:29Z

Here, we add the ability to download a spreadsheet for Case items from a search.

Previous code for spreadsheet downloads for VariantSamples was heavily refactored to provide more modular, tested, shared code for the two spreadsheets as well as any future spreadsheets required.

Additional features include:

Update to snovault to propagate request _stats to subrequests (PR)
New module (item_models.py) containing classes for selected items to obtain select properties. This could be more broadly useful and may deserve its own repo so can be imported outside the portal. Thoughts?
Addition of case spreadsheet download button (courtesy of Bianca).
Minor refactoring of search/compound_search.py to share constants and expand filter set search to a request with only global flags.
More workbook inserts to thoroughly test spreadsheet downloads.

Also, refactor spreadsheet flow of information and add spreadsheet streaming.

Also, add VariantSampleSpreadsheet headers and note embedding.

… bm-case-spreadsheet-ui

…ap-portal into bm-case-spreadsheet-ui

Bm case spreadsheet UI

…ap-portal into drr_case_spreadsheet

Also, comment out columns that could never be retrieved for VariantSamples from search.

willronchetti

I have no obvious objection to this, just some relatively small comments that may add up in total. Kudos to you for really going above and beyond with tests though, those look great.

I would say though some of this can probably be moved to snovault for easier bootstrapping for such things in smaht-portal. Your call if you want to take this on now or leave it for another time.

willronchetti · 2023-06-21T13:31:20Z

src/encoded/batch_download.py

+CASE_SPREADSHEET_ENDPOINT = "case_search_spreadsheet"
+CASE_SPREADSHEET_URL = format_to_url(CASE_SPREADSHEET_ENDPOINT)
+VARIANT_SAMPLE_SPREADSHEET_ENDPOINT = "variant_sample_search_spreadsheet"
+VARIANT_SAMPLE_SPREADSHEET_URL = format_to_url(VARIANT_SAMPLE_SPREADSHEET_ENDPOINT)


Structurally speaking, I would consider refactoring the route configuration and moving a common implementation into snovault ie: /spreadsheet/{type_name} - that way you can write implementers in downstream applications for various types while making the overall logic available across portals.

Ultimately I'm going to suggest we merge this as developed into CGAP but refactor the core components of it into snovault so it can be re-used in SMaHT.

willronchetti · 2023-06-21T13:34:27Z

src/encoded/batch_download.py

+def get_case_rows(
+    items_for_spreadsheet: Iterable[JsonObject],
+) -> Iterator[Iterable[str]]:
+    return CaseSpreadsheet(items_for_spreadsheet).yield_rows()
+
+
+def get_spreadsheet_response(
+    file_name: str, spreadsheet_rows: Iterator[List[str]], file_format: str
+) -> Response:
+    return SpreadsheetGenerator(
+        file_name, spreadsheet_rows, file_format=file_format
+    ).get_streaming_response()


I won't make this comment everywhere but docstrings would be helpful to at least give high level info, no need to do argument information as well though as I think between type annotations and good naming you are covered there.

willronchetti · 2023-06-21T13:35:53Z

src/encoded/batch_download.py

+        result = []
+        case_accession = self.spreadsheet_request.get_case_accession()
+        if case_accession:
+            result.append(["#", "Case Accession:", "", case_accession])


This type of CSV structure generation is a good candidate for another helper (it is repeated in several places below)

willronchetti · 2023-06-21T13:37:55Z

src/encoded/batch_download.py

+            result.append(["#", "Filters Selected:", "", readable_filter_blocks])
+        return result
+
+    def _get_row_for_item(self, item_to_evaluate: JsonObject) -> List[str]:


What is JsonObject getting you over dict?

willronchetti · 2023-06-21T13:38:16Z

src/encoded/batch_download.py

+            ("ID", "URL path to the variant", "@id"),
+            ("Chrom (hg38)", "Chromosome (hg38)", "variant.CHROM"),
+            ("Pos (hg38)", "Start position (hg38)", "variant.POS"),
+            ("Chrom (hg19)", "Chromosome (hg19)", "variant.hg19_chr"),
+            ("Pos (hg19)", "Start position (hg19)", "variant.hg19_pos"),
+            ("Ref", "Reference Nucleotide", "variant.REF"),
+            ("Alt", "Alternate Nucleotide", "variant.ALT"),


This should definitely be pulled from schema no?

willronchetti · 2023-06-21T16:53:17Z

src/encoded/item_models.py

+@dataclass(frozen=True)
+class Item:
+    ATID = "@id"
+    PROJECT = "project"
+
+    properties: JsonObject
+
+    @property
+    def _atid(self) -> str:
+        return self.properties.get(self.ATID, "")
+
+    @property
+    def _project(self) -> LinkTo:
+        return self.properties.get(self.PROJECT, "")
+
+    def get_properties(self) -> JsonObject:
+        return self.properties
+
+    def get_atid(self) -> str:
+        return self._atid
+
+    def get_project(self) -> LinkTo:
+        return self._project


Good candidate for snovault/extension here in CGAP

willronchetti · 2023-06-21T16:54:47Z

src/encoded/item_models.py

+@dataclass(frozen=True)
+class VariantConsequence(Item):
+    # Schema constants
+    IMPACT = "impact"
+    IMPACT_HIGH = "HIGH"
+    IMPACT_LOW = "LOW"
+    IMPACT_MODERATE = "MODERATE"
+    IMPACT_MODIFIER = "MODIFIER"
+    VAR_CONSEQ_NAME = "var_conseq_name"
+


At a meta level what are you trying to accomplish? It feels like you are duplicating very specific details of the data model to achieve some structure... I worry following/maintaining this structure will be burdensome, though maybe not when the data model is relatively stable.

willronchetti · 2023-06-21T16:56:36Z

src/encoded/static/components/static-pages/HomePage/UserDashboard.js

@@ -90,7 +91,8 @@ const AboveCasesTableOptions = React.memo(function AboveCasesTableOptions(props)
        context,
        onFilter, isContextLoading, navigate,
        sortBy, sortColumns,
-        hiddenColumns, addHiddenColumn, removeHiddenColumn, columnDefinitions
+        hiddenColumns, addHiddenColumn, removeHiddenColumn, columnDefinitions,
+        requestedCompoundFilterSet


Name requestedCompoundFilterSet is somewhat redundant no? requestedFilterSet I think would be fine.

willronchetti · 2023-06-21T16:58:03Z

src/encoded/tests/test_batch_download.py

+EXPECTED_VARIANT_SAMPLE_SPACER_ROW = [
+    "## -------------------------------------------------------"
+]


Should this not be an imported constant?

willronchetti · 2023-06-21T18:11:43Z

src/encoded/types/variant.py

+@view_config(
+    name='spreadsheet',
+    context=VariantSampleList,
+    request_method='GET',
+    permission='view',
+    validators=[validate_spreadsheet_file_format],
+)
+@debug_log
+def variant_sample_list_spreadsheet(context: VariantSampleList, request: Request):


Why have this one here with the others in batch_download.py? I honestly feel they may be better suited here since they are directly data model related.

drio18 and others added 30 commits February 16, 2023 14:58

Create new classes for spreadsheet creation

e2c0fe1

Separate out item classes

c6b6832

Also, refactor spreadsheet flow of information and add spreadsheet streaming.

Merge branch 'master' into drr_case_spreadsheet

5fb1c4b

Add tests + share more constants

b56fe91

Add tests for batch download utils

b116159

Add endpoint logic

4634e3e

Also, add VariantSampleSpreadsheet headers and note embedding.

Add item models + integrated tests

73fe76c

Merge branch 'master' of https://github.com/dbmi-bgm/cgap-portal into…

59a2e8a

… bm-case-spreadsheet-ui

Allow form mime type for case spreadsheet POST

1d9ea3f

Merge branch 'drr_case_spreadsheet' of https://github.com/dbmi-bgm/cg…

df9435a

…ap-portal into bm-case-spreadsheet-ui

Add test for form POSTs to case spreadsheet

c390fb3

Fix file name for spreadsheet downloads

5a74000

Merge branch 'drr_case_spreadsheet' of https://github.com/dbmi-bgm/cg…

842fa40

…ap-portal into bm-case-spreadsheet-ui

Reformat timestamp in file name

4c07def

Basic implementation

a0bbc25

Slight styling tweaks

122946c

Clean up some unnecessary bits from variant search

4f3702b

Merge branch 'master' into drr_case_spreadsheet

e471e4e

Merge branch 'drr_case_spreadsheet' into bm-case-spreadsheet-ui

e1aae89

Add more case fields to spreadsheet

e14af9e

Merge pull request #713 from dbmi-bgm/bm-case-spreadsheet-ui

e3cc614

Bm case spreadsheet UI

Enable compound search with only global flags

3fab774

Merge branch 'drr_case_spreadsheet' of https://github.com/dbmi-bgm/cg…

f9b66bd

…ap-portal into drr_case_spreadsheet

Merge case spreadsheet tests

6d6e40c

Finalize integrated test of case spreadsheet

63991c3

Add workbook fixtures for VS spreadsheet tests

1e14164

Bring in snovault beta

66d62c4

Also, comment out columns that could never be retrieved for VariantSamples from search.

Remove comment

c867a34

Use case title instead of id

6ff9d3f

Fix bugs for previous notes

8deca8c

drio18 added 16 commits June 13, 2023 14:59

Rename class + add default QC flag value

dd85e17

Clean up integrated tests + fix embed

dbab85c

Merge branch 'master' into drr_case_spreadsheet

a68f2ff

Fix tests

b84f4f0

Fix hanging method

3849cdc

Consolidate modules + remove replaced code

2b90c7a

Make spreadsheet request optional

f691974

Clean and finalize batch download tests

022e050

Clean + finish batch download utils tests

262aeb3

Polish + finish item model tests

ac4c32f

Fix property patches

72cd7a1

Format with black

095e064

Remove import

5778c7e

Remove unnecessary imports

181ad7f

Add some docstrings

5a41289

Remove duplicated batch download config inclusion

5f70cb4

drio18 requested a review from willronchetti June 20, 2023 21:14

willronchetti approved these changes Jun 21, 2023

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Case Spreadsheet Download #722

Case Spreadsheet Download #722

drio18 commented Jun 13, 2023 •

edited

Loading

willronchetti left a comment

willronchetti Jun 21, 2023

willronchetti Jun 21, 2023

willronchetti Jun 21, 2023

willronchetti Jun 21, 2023

willronchetti Jun 21, 2023

willronchetti Jun 21, 2023

willronchetti Jun 21, 2023

willronchetti Jun 21, 2023

willronchetti Jun 21, 2023

willronchetti Jun 21, 2023

Case Spreadsheet Download #722

Are you sure you want to change the base?

Case Spreadsheet Download #722

Conversation

drio18 commented Jun 13, 2023 • edited Loading

willronchetti left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

drio18 commented Jun 13, 2023 •

edited

Loading