-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Initial assignment of cell ontology IDs to panglao cell types #909
Initial assignment of cell ontology IDs to panglao cell types #909
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with your approach. I am not approving yet because we should address what to do about the Panglao DB file that's too big before merging. I added my thoughts in an inline comment.
module_base <- rprojroot::find_root(rprojroot::is_renv_project) | ||
|
||
# read in original ref file | ||
ref_file <- file.path(module_base, "references", "PanglaoDB_markers_2020-03-27.tsv") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few thoughts:
- Do we want to include a data download script that grabs this from
scpca-nf
(provided I interpreted your comment I include below correctly) in this module? - Should we explicitly ignore this file in this module?
I think that's how I'd address this concern:
I did want to point out that currently the actual reference file from Panglao is not in the repo because it's ~ 1000 KB and we have a 200 KB limit on TSV files. How do we want to proceed here? We probably don't really need it and I could just make a list of the cell types and save as a text file to read in and point to where the file lives in
scpca-nf
or we could just make an exception for this file?
I liked the idea of adding a script so I did that and then also included the file in the gitignore for this module. I also added a readme for the scripts and references folders. That should at least get us started on having documentation but I imagine I will expand/ update those readmes as we continue on this process. @jaclyn-taroni this should be ready for another look. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
||
# define path to ref file and url | ||
ref_file="${scripts_dir}/../references/PanglaoDB_markers_2020-03-27.tsv" | ||
ref_url="https://raw.githubusercontent.com/AlexsLemonade/scpca-nf/refs/heads/main/references/PanglaoDB_markers_2020-03-27.tsv" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose main
is fine here (instead of a permalink) since the file name captures version information, and we'd probably want to use whatever the current version of this file is anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and we'd probably want to use whatever the current version of this file is anyway.
This exact thought was my reasoning for using main
here.
Purpose/implementation Section
Please link to the GitHub issue that this pull request addresses.
Related to #887
What is the goal of this pull request?
Here I am starting the process of assigning cell ontology IDs to the cell types present in the Panglao reference we use when running
CellAssign
as part ofscpca-nf
. This initial PR does some preliminary R setup and adds a script to assign ontology IDs to cell types that have exact matches.Briefly describe the general approach you took to achieve this goal.
I did want to point out that currently the actual reference file from Panglao is not in the repo because it's ~ 1000 KB and we have a 200 KB limit on TSV files. How do we want to proceed here? We probably don't really need it and I could just make a list of the cell types and save as a text file to read in and point to where the file lives in
scpca-nf
or we could just make an exception for this file?If known, do you anticipate filing additional pull requests to complete this analysis module?
Yes. After assigning labels this way there are 92 cell types that will need to be manually assigned. I'm thinking I'll break this up into ~ 4-5 PRs and do 20-25 cell types at a time.
Provide directions for reviewers
What are the software and computational requirements needed to be able to run the code in this PR?
Any packages needed to run this script have been recorded in the lock file.
Are there particular areas you'd like reviewers to have a close look at?
I first want to get some feedback on the overall approach here and make sure we are okay with the decisions I made in the script. Are there things you would change about the overall setup here? Once we are on the same page regarding building this new file with the ontology IDs and the use of the script I'll add to the README and document this process. We can do that in this PR or a new PR.
Is there anything that you want to discuss further?
Any thoughts on how to handle storing the large Panglao file?
Author checklists
Analysis module and review
README.md
has been updated to reflect code changes in this pull request.Reproducibility checklist
Dockerfile
.environment.yml
file.renv.lock
file.