cell type/tumor annotation for ETP T-ALL (SCPCP000003) #826

UTSouthwesternDSSR · 2024-10-16T19:58:33Z

Purpose/implementation Section

To perform cell type/tumor annotation for ETP T-ALL samples (n=31) in SCPCP000003

Please link to the GitHub issue that this pull request addresses.

#822

What is the goal of this pull request?

To perform cell type/tumor annotation for ETP T-ALL samples (n=31) in SCPCP000003

Briefly describe the general approach you took to achieve this goal.

The same approach is followed as proposed in the module for non-ETP T-ALL (SCPCP000003):

using SAM algorithm to separate the cells
ScType for cell type annotation using the same marker list
CopyKat for identification of malignant cells based on the profile of genome-wide aneuploidy

The only difference is that there are more than one cluster identified as B cell in 4 samples (SCPCL000055, SCPCL000066, SCPCL000696, and SCPCL000709). I check the location of B cells on the umap and also compare with the BFeatures1 (average expression of B marker genes) on the dotplot. (I also check with the expression of adt_CD19 in these 4 samples. Higher expression is shown on the separated B island.)

I believe that only those that are completely separated (an island on its own, rather than attached to the other clusters) can be confidently used as the normal cells for running CopyKat, although the results is not super promising (shown later).

If known, do you anticipate filing additional pull requests to complete this analysis module?

Results

What is the name of your results bucket on S3?

rds objects: s3://researcher-650251722463-us-east-2/cell-type-ETP-ALL-03/results/rds
metadata and ScType results: s3://researcher-650251722463-us-east-2/cell-type-ETP-ALL-03/results/
umap and dot plots: s3://researcher-650251722463-us-east-2/cell-type-ETP-ALL-03/plots

What types of results does your code produce (e.g., table, figure)?

rds objects
two text files for each sample: _metadata.txt (cell ID, leiden clusters, cell type annotation, low confidence cell type annotation, CopyKat prediction, and new CopyKat prediction based on the "selected" B cells [for the 4 samples]) and _sctype_top10_celltypes_perCluster.txt (top 10 possible cell types with their respective sctype score in each cluster)
multipanels_ umap plots showing leiden clustering, cell type, and copyKat prediction respectively (for the 4 samples, I am showing the new CopyKat prediction based on "selected" B cells).
dot plots showing the average expression of group of markers for each cell type using AddModuleScore()

What is your summary of the results?

With the default threshold of having sctype score > 25% of ncells in a cluster (sctype_classification), there are a large number of cells being annotated as "Unknown" in each sample, ranging from 0 to 61%, with the median ~25%.

If we were to use 10% threshold (lowConfidence_annot), the percentage of Unknown is now capped at 28% (instead of 61%).

Every sample has B cells annotated. Thus, I ran CopyKat on all of them, but as mentioned above, I selected some B cells as the normal for these 4 samples (SCPCL000055, SCPCL000066, SCPCL000696, and SCPCL000709). Here are the comparison between using all B cells (copykat.pred) vs particular B cells (new_copykat.pred). The results seem to make sense for SCPCL000055, since there are now much more aneuploid cells, and Late Eryth has become diploid. It works for SCPCL000066 too, but not really for SCPCL000709, as the number of aneuploid cells decrease. SCPCL000696 shows very little changes.

Overall, there are very few not.defined cells from CopyKat prediction results, capping at 6% of total cells in a sample.

Provide directions for reviewers

What are the software and computational requirements needed to be able to run the code in this PR?

The packages are installed and updated in renv.lock and conda.lock.
Analysis could be executed on a Standard-4XL virtual machine via AWS Lightsail for Research

Are there particularly areas you'd like reviewers to have a close look at?

Is there anything that you want to discuss further?

Please let me know what do you think about selecting some B cells for these 4 samples, instead of providing all B cells as the normal.
I am not sure if you still want the CopyKat results like what we discussed in the non-ETP module, as the total size for their output is about 101GB, and it seems like we may try using inferCNV instead.

Author checklists

Check all those that apply.
Note that you may find it easier to check off these items after the pull request is actually filed.

Analysis module and review

This analysis module uses the analysis template and has the expected directory structure.
The analysis module README.md has been updated to reflect code changes in this pull request.
The analytical code is documented and contains comments.
Any results and/or plots this code produces have been added to your S3 bucket for review.

Reproducibility checklist

Code in this pull request has been added to the GitHub Action workflow that runs this module.
The dependencies required to run the code in this pull request have been added to the analysis module Dockerfile.
If applicable, the dependencies required to run the code in this pull request have been added to the analysis module conda environment.yml file.
If applicable, R package dependencies required to run the code in this pull request have been added to the analysis module renv.lock file.

…outhwesternDSSR/jwl

jaclyn-taroni

Thanks for this contribution, @UTSouthwesternDSSR! Since this is similar to the non-ETP ALL module, we can merge it.

Having reviewed the results, the B cell assignments from ScType are reasonably convincing for some libraries.

In terms of next steps, I would recommend picking one set of samples (non-ETP ALL or ETP ALL) to focus on and see if you can get the best quality B cell assignments possible (you might want to pull in the automatic assignments we already have from CellAssign and SingleR for comparison, too) and get inferCNV up and running.

Thank you again, and please let us know if there's anything you want to discuss!

UTSouthwesternDSSR · 2024-10-17T13:39:25Z

Sure, I would start out with the non-ETP samples, and do what you have suggested (making sure that the B cells called are indeed solid, and then used them for running inferCNV) in the other pull request, and then test them with the ETP samples. Thank you so much for the suggestion!

UTSouthwesternDSSR and others added 16 commits October 2, 2024 17:15

init module skeleton

4396373

update gitignore

0da0ec5

added marker file

fd023eb

update readme

8bb093f

updated scripts for nonETP

67cab4e

updated script for ETP

5f03ef8

Merge remote-tracking branch 'origin/UTSouthwesternDSSR/jwl' into UTS…

b507ab5

…outhwesternDSSR/jwl

updated for plotting

da69280

add multipanel R script

7e3d4db

updated scripts

66e0a23

store the module score in seu object

e6d45fd

update renv.lock

17ed000

Merge branch 'AlexsLemonade:main' into UTSouthwesternDSSR/jwl

38b7c50

edit script for re-running CopyKat on specific B cells (normal)

7ef2c9d

Merge remote-tracking branch 'origin/UTSouthwesternDSSR/jwl' into UTS…

275f37e

…outhwesternDSSR/jwl

added plots

1687171

UTSouthwesternDSSR requested a review from jaclyn-taroni as a code owner October 16, 2024 19:58

jaclyn-taroni added 4 commits October 17, 2024 08:34

Uncomment triggers for GHA workflows

7b25f9b

Update run GHA workflow to use environments, download correct data, test

f6c8ce1

Flesh out Dockerfile

eee69ab

Update image source in cell-type-nonETP-ALL-03 module

813d1ea

jaclyn-taroni approved these changes Oct 17, 2024

View reviewed changes

Merge branch 'main' into UTSouthwesternDSSR/jwl

172e89b

Add modules that are ready to be tested monthly

5938976

jaclyn-taroni merged commit 3362d31 into AlexsLemonade:main Oct 17, 2024
6 of 7 checks passed

UTSouthwesternDSSR mentioned this pull request Oct 31, 2024

Submission table for cell type and tumor classification of ETP T-ALL (SCPCP000003) #847

Merged

8 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cell type/tumor annotation for ETP T-ALL (SCPCP000003) #826

cell type/tumor annotation for ETP T-ALL (SCPCP000003) #826

UTSouthwesternDSSR commented Oct 16, 2024

jaclyn-taroni left a comment

UTSouthwesternDSSR commented Oct 17, 2024

cell type/tumor annotation for ETP T-ALL (SCPCP000003) #826

cell type/tumor annotation for ETP T-ALL (SCPCP000003) #826

Conversation

UTSouthwesternDSSR commented Oct 16, 2024

Purpose/implementation Section

Please link to the GitHub issue that this pull request addresses.

What is the goal of this pull request?

Briefly describe the general approach you took to achieve this goal.

If known, do you anticipate filing additional pull requests to complete this analysis module?

Results

What is the name of your results bucket on S3?

What types of results does your code produce (e.g., table, figure)?

What is your summary of the results?

Provide directions for reviewers

What are the software and computational requirements needed to be able to run the code in this PR?

Are there particularly areas you'd like reviewers to have a close look at?

Is there anything that you want to discuss further?

Author checklists

Analysis module and review

Reproducibility checklist

jaclyn-taroni left a comment

Choose a reason for hiding this comment

UTSouthwesternDSSR commented Oct 17, 2024