Single cell RNA sequencing of pediatric high-grade gliomas, SCPCP000001 #722
Replies: 3 comments 4 replies
-
Hi @georginaalbadri! I'm Jen, the Scientific Community Manager at the Data Lab. Thank you for sharing your proposed analysis! Have you filled out the contributor form yet? On this form, you will provide the name and email address that will be associated with the AWS account that we'll create for you. We also need this form returned to ensure you have agreed to the OpenScPCA terms and conditions and other policies. Once we receive this, our team will review your proposed analyses and get back to you with next steps within 3 business days! In the meantime, please let us know if you have any questions about OpenScPCA. We look forward to discussing more with you soon! |
Beta Was this translation helpful? Give feedback.
-
Thank you for submitting the form @georginaalbadri! I am realizing that some of our team members will be traveling next week, so it may take longer than 3 business days to review your proposed analysis. My apologies! You can expect to hear more feedback from our team during the week of August 26. But I will get back to you shortly with information about your AWS account set up. |
Beta Was this translation helpful? Give feedback.
-
HI @georginaalbadri! I'm Stephanie, one of the Data Scientists in the Data Lab. We're looking forward to having you on board as an OpenScPCA contributor! Before you get started, I wanted to provide some feedback about your proposed analysis and offer additional guidance on how you can get started contributing. Proposal questionsFirst, I have some feedback and questions about your proposed analysis specifically: Data pre-processingYou wrote,
First, a quick implementation note: When you create your analysis module, you will likely want use one of the "Python" flags make it a Python module. I recommend (and see below) that this is should be your first pull request: establishing your Python-based analysis module. Second, it's great to see that you're planning to follow a set of "best practices" for this kind of analysis! However, for OpenScPCA contributions in general, we actually recommend that you start your analyses with the processed ScPCA processed matrices (aka, the In addition, if you would like to filter out doublets, we have actually processed all datasets already with All that said, if there is a compelling reason why your pre-processing pipeline might be preferable, please let us know and we can discuss this particular circumstance. Analysis methodsYou wrote,
Thanks for providing some initial details about your proposed approach! To make sure we're on the same page for all of this, let's clarify a couple details:
Recommended next stepsFirst, let's take a bit of time to discuss your analysis in this Discussion post so we're on the same page for the exact analyses you'll be performing. Then, once I have a clearer sense of the specific steps you're going to take, I can recommend an "order" of issues & pull requests for you to file that will help get you across the finish line more efficiently. This is essentially "scoping your work" to ensure slow-and-steady modular progress towards the final cell type annotations. Remember that the more focused a given pull request is, the faster it will move through review. After we discuss, you'll be ready to start your analysis! Please follow the below steps to start contributing to the project:
After this PR has been reviewed and accepted, you will be ready to continue with the rest of the analysis! You'll file issues as you go, with one (or more, if needed) pull requests to complete each issue. Thanks again for your interest in OpenScPCA, and I'm looking forward to working with you! One more quick note - you might be interested in joining our Childhood Cancer Data Science Slack, which you can use to communicate with other OpenScPCA contributors, the broader pediatric cancer research community, as well as to directly ask us in the Data Lab questions about your analysis module! |
Beta Was this translation helpful? Give feedback.
-
Proposed analysis
I propose data preprocessing, by filtering and doublet detection, followed by clustering by dimensionality reduction and k-means clustering. Clusters will be annotated by combining analysis of marker genes, differential expression, and cell label transfer using linear regression.
Scientific goals
The data will be cleaned of low-quality cells, and two to three levels of cell labels provided. This will include a top layer of malignant vs non-malignant cells, and a second layer classifying the non-malignant cells into e.g. Neurons, Astrocytes, Oligodendrocytes etc. The tumour cells can be classified further into OPC-like, AC-like, NPC-like, and mesenchymal-like.
Methods or approach
The analysis will be done in Python, primarily using scanpy. Preprocessing and labelling will follow best practices https://www.sc-best-practices.org/preprocessing_visualization/quality_control.html. Filtering will be done by median absolute deviation filtering, and doublet removal using doubletdetection.
Dimensionality reduction will be done using UMAP and leiden clustering performed. In-built scanpy functions will be used to assess marker gene and differential gene expression for annotation. CellTypist will be used to complement annotation by marker genes, which is a cell label transfer method utilising linear regression.
Existing modules
There is potential to collaborate with other projects containing glioblastoma samples
Input data
Cell label transfer will utilise the GBMap dataset https://www.biorxiv.org/content/10.1101/2022.08.27.505439v1
Scientific literature
Reference dataset GBMap https://www.biorxiv.org/content/10.1101/2022.08.27.505439v1
GBM subtypes with markers https://doi.org/10.1016/j.cell.2019.06.024
Other details
Resources: Local machine and university HPC.
Timeline: I anticipate a first draft of cell labels will be available at the end of September.
Beta Was this translation helpful? Give feedback.
All reactions