Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Case Spreadsheet Download #722

Open
wants to merge 46 commits into
base: master
Choose a base branch
from
Open

Case Spreadsheet Download #722

wants to merge 46 commits into from

Conversation

drio18
Copy link
Contributor

@drio18 drio18 commented Jun 13, 2023

Here, we add the ability to download a spreadsheet for Case items from a search.

Previous code for spreadsheet downloads for VariantSamples was heavily refactored to provide more modular, tested, shared code for the two spreadsheets as well as any future spreadsheets required.

Additional features include:

  • Update to snovault to propagate request _stats to subrequests (PR)
  • New module (item_models.py) containing classes for selected items to obtain select properties. This could be more broadly useful and may deserve its own repo so can be imported outside the portal. Thoughts?
  • Addition of case spreadsheet download button (courtesy of Bianca).
  • Minor refactoring of search/compound_search.py to share constants and expand filter set search to a request with only global flags.
  • More workbook inserts to thoroughly test spreadsheet downloads.

drio18 and others added 30 commits February 16, 2023 14:58
Also, refactor spreadsheet flow of information and add spreadsheet
streaming.
Also, add VariantSampleSpreadsheet headers and note embedding.
Also, comment out columns that could never be retrieved for
VariantSamples from search.
@drio18 drio18 requested a review from willronchetti June 20, 2023 21:14
Copy link
Contributor

@willronchetti willronchetti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no obvious objection to this, just some relatively small comments that may add up in total. Kudos to you for really going above and beyond with tests though, those look great.

I would say though some of this can probably be moved to snovault for easier bootstrapping for such things in smaht-portal. Your call if you want to take this on now or leave it for another time.

Comment on lines +44 to +47
CASE_SPREADSHEET_ENDPOINT = "case_search_spreadsheet"
CASE_SPREADSHEET_URL = format_to_url(CASE_SPREADSHEET_ENDPOINT)
VARIANT_SAMPLE_SPREADSHEET_ENDPOINT = "variant_sample_search_spreadsheet"
VARIANT_SAMPLE_SPREADSHEET_URL = format_to_url(VARIANT_SAMPLE_SPREADSHEET_ENDPOINT)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Structurally speaking, I would consider refactoring the route configuration and moving a common implementation into snovault ie: /spreadsheet/{type_name} - that way you can write implementers in downstream applications for various types while making the overall logic available across portals.

Ultimately I'm going to suggest we merge this as developed into CGAP but refactor the core components of it into snovault so it can be re-used in SMaHT.

Comment on lines +156 to +167
def get_case_rows(
items_for_spreadsheet: Iterable[JsonObject],
) -> Iterator[Iterable[str]]:
return CaseSpreadsheet(items_for_spreadsheet).yield_rows()


def get_spreadsheet_response(
file_name: str, spreadsheet_rows: Iterator[List[str]], file_format: str
) -> Response:
return SpreadsheetGenerator(
file_name, spreadsheet_rows, file_format=file_format
).get_streaming_response()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I won't make this comment everywhere but docstrings would be helpful to at least give high level info, no need to do argument information as well though as I think between type annotations and good naming you are covered there.

result = []
case_accession = self.spreadsheet_request.get_case_accession()
if case_accession:
result.append(["#", "Case Accession:", "", case_accession])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This type of CSV structure generation is a good candidate for another helper (it is repeated in several places below)

result.append(["#", "Filters Selected:", "", readable_filter_blocks])
return result

def _get_row_for_item(self, item_to_evaluate: JsonObject) -> List[str]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is JsonObject getting you over dict?

Comment on lines +274 to +280
("ID", "URL path to the variant", "@id"),
("Chrom (hg38)", "Chromosome (hg38)", "variant.CHROM"),
("Pos (hg38)", "Start position (hg38)", "variant.POS"),
("Chrom (hg19)", "Chromosome (hg19)", "variant.hg19_chr"),
("Pos (hg19)", "Start position (hg19)", "variant.hg19_pos"),
("Ref", "Reference Nucleotide", "variant.REF"),
("Alt", "Alternate Nucleotide", "variant.ALT"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should definitely be pulled from schema no?

Comment on lines +13 to +35
@dataclass(frozen=True)
class Item:
ATID = "@id"
PROJECT = "project"

properties: JsonObject

@property
def _atid(self) -> str:
return self.properties.get(self.ATID, "")

@property
def _project(self) -> LinkTo:
return self.properties.get(self.PROJECT, "")

def get_properties(self) -> JsonObject:
return self.properties

def get_atid(self) -> str:
return self._atid

def get_project(self) -> LinkTo:
return self._project
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good candidate for snovault/extension here in CGAP

Comment on lines +38 to +47
@dataclass(frozen=True)
class VariantConsequence(Item):
# Schema constants
IMPACT = "impact"
IMPACT_HIGH = "HIGH"
IMPACT_LOW = "LOW"
IMPACT_MODERATE = "MODERATE"
IMPACT_MODIFIER = "MODIFIER"
VAR_CONSEQ_NAME = "var_conseq_name"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At a meta level what are you trying to accomplish? It feels like you are duplicating very specific details of the data model to achieve some structure... I worry following/maintaining this structure will be burdensome, though maybe not when the data model is relatively stable.

@@ -90,7 +91,8 @@ const AboveCasesTableOptions = React.memo(function AboveCasesTableOptions(props)
context,
onFilter, isContextLoading, navigate,
sortBy, sortColumns,
hiddenColumns, addHiddenColumn, removeHiddenColumn, columnDefinitions
hiddenColumns, addHiddenColumn, removeHiddenColumn, columnDefinitions,
requestedCompoundFilterSet
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Name requestedCompoundFilterSet is somewhat redundant no? requestedFilterSet I think would be fine.

Comment on lines +76 to +78
EXPECTED_VARIANT_SAMPLE_SPACER_ROW = [
"## -------------------------------------------------------"
]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this not be an imported constant?

Comment on lines +1365 to +1373
@view_config(
name='spreadsheet',
context=VariantSampleList,
request_method='GET',
permission='view',
validators=[validate_spreadsheet_file_format],
)
@debug_log
def variant_sample_list_spreadsheet(context: VariantSampleList, request: Request):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why have this one here with the others in batch_download.py? I honestly feel they may be better suited here since they are directly data model related.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants