Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HTAN 1.0 Data Model: Archive Activities #448

Closed
aclayton555 opened this issue Aug 12, 2024 · 4 comments
Closed

HTAN 1.0 Data Model: Archive Activities #448

aclayton555 opened this issue Aug 12, 2024 · 4 comments
Assignees
Labels

Comments

@aclayton555
Copy link
Contributor

aclayton555 commented Aug 12, 2024

Split from #389

ARCHIVE - Proposed for August 2024. Understand and planning of steps to 'archive' the HTAN 1.0 Data Model. This will include understand if/what data are potentially still expected to be submitted under this model beyond August 31, 2024, and how this will be supported.

Questions, considerations & assumptions:

  • Need to perform final review of backlog and triage for NOW, renewal, or won't do
  • Data are still expected to be submitted from HTAN 1.0 centers after August 31, 2024
  • HTAN 1.0 and HTAN 2.0 will be treated separately. Any remaining HTAN 1.0 data will be submitted in accordance with the HTAN 1.0 data model. The HTAN 1.0 data model serves as a basis for the HTAN 2.0 data model, but considerable updates and refinement are planned to establish the HTAN 2.0 data model early in the renewal, prior to any HTAN 2.0 data submissions.
  • Do we want to set up a new repo for the HTAN 2.0 data model? If so, what, if any, remaining updates should be made within the HTAN 1.0 data model repo?
  • What maintenance and infrastructure requirements are required to be maintained after August 31, 2024 (i.e. the DCA currently points to the HTAN 1.0 model)
@aclayton555
Copy link
Contributor Author

aclayton555 commented Aug 13, 2024

Discussed during 24-8 kick-off:

Proposal: HTAN 1.0 and HTAN 2.0 will be treated separately. Any remaining HTAN 1.0 data will be submitted in accordance with the HTAN 1.0 data model. The HTAN 1.0 data model serves as a basis for the HTAN 2.0 data model, but considerable updates and refinement are planned to establish the HTAN 2.0 data model early in the renewal, prior to any HTAN 2.0 data submissions.

Action: Need to perform final review of backlog and triage for NOW, renewal, or won't do

  • This has now been done as part of the 24-8 sprint kick off. All issues expected to be completed within HTAN 1.0 have been labelled with "critical." Issues expected to be revisited/pushed to the renewal have been labelled with "renewal." Upon review and discussion, several issues moved to the status of "Not planned," as this work will not be performed for HTAN

Assumption: Data are still expected to be submitted from HTAN 1.0 centers after August 31, 2024

  • This is expected. We will keep all phase 1.0 infrastructure active and supported. No new schemas (components) to be added to HTAN 1.0 data model. Any mop up submissions for future point releases will be on CDS Seq template or other assay template.

Question: Do we want to set up a new repo for the HTAN 2.0 data model? If so, what, if any, remaining updates should be made within the HTAN 1.0 data model repo?

  • Yes, new HTAN 2.0 repo and supporting infrastructure (e.g. phase 2.0 DCA) to be set up and supported in parallel with remaining maintenance of HTAN 1.0 model and infrastructure (i.e. keep the door open and working for HTAN 1.0 submissions). Stuff will be pulled over to a new repo for HTAN 2.0. Timeline TBD for final sunset/shutdown/archive of phase 1.0 infrastructure, but for now, no changes are needed to this current data model repo.

Question: What maintenance and infrastructure requirements are required to be maintained after August 31, 2024 (i.e. the DCA currently points to the HTAN 1.0 model)

  • See above. Need to keep partitioning in mind for overall architecture. Keep phase 1.0 stuff running while developing phase 2.0 infrastructure. However, we will need to think about what is needed on the portal side and how users can interact with and query HTAN 1.0 and 2.0 data (ie. to the user, data will just be "HTAN data"). Need to understand expectations based on the renewal proposal, and whether we need to establish mapping between data of different phases, or support a major curation effort. @aclayton555 to surface this during NYC visit in August.

@aclayton555
Copy link
Contributor Author

aclayton555 commented Sep 11, 2024

Need to understand expectations based on the renewal proposal, and whether we need to establish mapping between data of different phases, or support a major curation effort. @aclayton555 to surface this during NYC visit in August.

24-8 Closeout: Portal already can display columns across different data types in the File explorer. So there is flexibility here. Opportunity for us to think about a minimal attributes that map across phase 1 and 2, and/or think about how we flag 1.0 vs 2.0 data. Something to think about in Y2 of the renewal when we have more 2.0 data. Nothing will break the portal immediately, but this will get complicated if we implement a more granular hierarchal structure. Based on the data types we are starting to expect in phase 2.0, a hierarchal model may be needed to adequately capture the complexities and contextual information (while balancing minimal elements and low barrier to entry). Take this into consideration in design doc planning.

Portal ticket filed here: ncihtan/htan-portal#672

@aclayton555
Copy link
Contributor Author

Additional tidy up captured in Sage-Bionetworks/data_curator_config#211

@aclayton555
Copy link
Contributor Author

Ticket to set up new repo in #463

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant