Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cell type/tumor annotation for ETP T-ALL (SCPCP000003) #826

Merged

Conversation

UTSouthwesternDSSR
Copy link
Contributor

Purpose/implementation Section

To perform cell type/tumor annotation for ETP T-ALL samples (n=31) in SCPCP000003

Please link to the GitHub issue that this pull request addresses.

#822

What is the goal of this pull request?

To perform cell type/tumor annotation for ETP T-ALL samples (n=31) in SCPCP000003

Briefly describe the general approach you took to achieve this goal.

The same approach is followed as proposed in the module for non-ETP T-ALL (SCPCP000003):

  • using SAM algorithm to separate the cells
  • ScType for cell type annotation using the same marker list
  • CopyKat for identification of malignant cells based on the profile of genome-wide aneuploidy

The only difference is that there are more than one cluster identified as B cell in 4 samples (SCPCL000055, SCPCL000066, SCPCL000696, and SCPCL000709). I check the location of B cells on the umap and also compare with the BFeatures1 (average expression of B marker genes) on the dotplot. (I also check with the expression of adt_CD19 in these 4 samples. Higher expression is shown on the separated B island.)
Screenshot 2024-10-16 at 12 35 15 PM

I believe that only those that are completely separated (an island on its own, rather than attached to the other clusters) can be confidently used as the normal cells for running CopyKat, although the results is not super promising (shown later).

If known, do you anticipate filing additional pull requests to complete this analysis module?

Results

What is the name of your results bucket on S3?

  • rds objects: s3://researcher-650251722463-us-east-2/cell-type-ETP-ALL-03/results/rds
  • metadata and ScType results: s3://researcher-650251722463-us-east-2/cell-type-ETP-ALL-03/results/
  • umap and dot plots: s3://researcher-650251722463-us-east-2/cell-type-ETP-ALL-03/plots

What types of results does your code produce (e.g., table, figure)?

  • rds objects
  • two text files for each sample: _metadata.txt (cell ID, leiden clusters, cell type annotation, low confidence cell type annotation, CopyKat prediction, and new CopyKat prediction based on the "selected" B cells [for the 4 samples]) and _sctype_top10_celltypes_perCluster.txt (top 10 possible cell types with their respective sctype score in each cluster)
  • multipanels_ umap plots showing leiden clustering, cell type, and copyKat prediction respectively (for the 4 samples, I am showing the new CopyKat prediction based on "selected" B cells).
  • dot plots showing the average expression of group of markers for each cell type using AddModuleScore()

What is your summary of the results?

With the default threshold of having sctype score > 25% of ncells in a cluster (sctype_classification), there are a large number of cells being annotated as "Unknown" in each sample, ranging from 0 to 61%, with the median ~25%.
Screenshot 2024-10-16 at 2 19 05 PM
If we were to use 10% threshold (lowConfidence_annot), the percentage of Unknown is now capped at 28% (instead of 61%).

Every sample has B cells annotated. Thus, I ran CopyKat on all of them, but as mentioned above, I selected some B cells as the normal for these 4 samples (SCPCL000055, SCPCL000066, SCPCL000696, and SCPCL000709). Here are the comparison between using all B cells (copykat.pred) vs particular B cells (new_copykat.pred). The results seem to make sense for SCPCL000055, since there are now much more aneuploid cells, and Late Eryth has become diploid. It works for SCPCL000066 too, but not really for SCPCL000709, as the number of aneuploid cells decrease. SCPCL000696 shows very little changes.
Screenshot 2024-10-16 at 2 28 23 PM
Screenshot 2024-10-16 at 2 32 31 PM

Overall, there are very few not.defined cells from CopyKat prediction results, capping at 6% of total cells in a sample.

Provide directions for reviewers

What are the software and computational requirements needed to be able to run the code in this PR?

  • The packages are installed and updated in renv.lock and conda.lock.
  • Analysis could be executed on a Standard-4XL virtual machine via AWS Lightsail for Research

Are there particularly areas you'd like reviewers to have a close look at?

Is there anything that you want to discuss further?

  • Please let me know what do you think about selecting some B cells for these 4 samples, instead of providing all B cells as the normal.
  • I am not sure if you still want the CopyKat results like what we discussed in the non-ETP module, as the total size for their output is about 101GB, and it seems like we may try using inferCNV instead.

Author checklists

Check all those that apply.
Note that you may find it easier to check off these items after the pull request is actually filed.

Analysis module and review

Reproducibility checklist

  • Code in this pull request has been added to the GitHub Action workflow that runs this module.
  • The dependencies required to run the code in this pull request have been added to the analysis module Dockerfile.
  • If applicable, the dependencies required to run the code in this pull request have been added to the analysis module conda environment.yml file.
  • If applicable, R package dependencies required to run the code in this pull request have been added to the analysis module renv.lock file.

Copy link
Member

@jaclyn-taroni jaclyn-taroni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this contribution, @UTSouthwesternDSSR! Since this is similar to the non-ETP ALL module, we can merge it.

Having reviewed the results, the B cell assignments from ScType are reasonably convincing for some libraries.

In terms of next steps, I would recommend picking one set of samples (non-ETP ALL or ETP ALL) to focus on and see if you can get the best quality B cell assignments possible (you might want to pull in the automatic assignments we already have from CellAssign and SingleR for comparison, too) and get inferCNV up and running.

Thank you again, and please let us know if there's anything you want to discuss!

@UTSouthwesternDSSR
Copy link
Contributor Author

Sure, I would start out with the non-ETP samples, and do what you have suggested (making sure that the B cells called are indeed solid, and then used them for running inferCNV) in the other pull request, and then test them with the ETP samples. Thank you so much for the suggestion!

@jaclyn-taroni jaclyn-taroni merged commit 3362d31 into AlexsLemonade:main Oct 17, 2024
6 of 7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants