Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add function to perform graph-based clustering #749

Conversation

sjspielman
Copy link
Member

@sjspielman sjspielman commented Sep 4, 2024

Purpose/implementation Section

Please link to the GitHub issue that this pull request addresses.

Closes #745

What is the goal of this pull request?

This PR adds a function to perform clustering to the rOpenScPCA package. There are many file changes, because I've also merged in main to stay in sync, but review can focus only on code in packages/rOpenScPCA.

I'm filing this as a draft to get some initial feedback, and then can implement the rest (including adding more tests, which I know are thin right now!) and mark for review. Here are the main questions I'm looking for feedback on, but of other high-level feedback is welcome :)

  • As a starting point (as in, we could do more/less!), I have included three arguments for specifying "additional" clustering parameters:resolution (used by louvain and leiden), objective_function (used by leiden), and cluster_args (for anything else). How does this choice feel to you? Prefer to see fewer or more of the algorithm-specific options pulled out as individual arguments?
    • Noting further I don't have any checks for these arguments as of yet, but sure could!
  • Should the table the function returns contain additional columns, e.g. of other non-default parameters used for the clustering, beyond what I've coded up so far?
    • At one point I wondered if a DataFrame would be better since it store a lot of that info as metadata, but then I felt it might get confusing for users.
  • Any thoughts on this comment? (noting that I edited the opening comment of Add rOpenScPCA function to perform graph-based clustering #745 after I wrote this; the matrix quoted there is the wrong matrix! we want pcs, not expression...)

Author checklists

Check all those that apply.
Note that you may find it easier to check off these items after the pull request is actually filed.

Analysis module and review

Reproducibility checklist

  • Code in this pull request has been added to the GitHub Action workflow that runs this module.
  • The dependencies required to run the code in this pull request have been added to the analysis module Dockerfile.
  • If applicable, the dependencies required to run the code in this pull request have been added to the analysis module conda environment.yml file.
  • If applicable, R package dependencies required to run the code in this pull request have been added to the analysis module renv.lock file.

Copy link
Member

@jashapiro jashapiro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall, but maybe file a separate PR with just the main changes to make this one smaller and easier to review?

I had a number of small suggestions for now, but overall this looks on the right track to me.

packages/rOpenScPCA/R/calculate-clusters.R Outdated Show resolved Hide resolved
packages/rOpenScPCA/R/calculate-clusters.R Outdated Show resolved Hide resolved
packages/rOpenScPCA/R/calculate-clusters.R Outdated Show resolved Hide resolved
packages/rOpenScPCA/R/calculate-clusters.R Outdated Show resolved Hide resolved
packages/rOpenScPCA/R/calculate-clusters.R Outdated Show resolved Hide resolved
packages/rOpenScPCA/R/calculate-clusters.R Outdated Show resolved Hide resolved
packages/rOpenScPCA/R/calculate-clusters.R Outdated Show resolved Hide resolved


# Transform results into a table and return
# TODO: Should this have _all_ (non-default/user-specified) parameters in cluster.args?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it probably should, to allow easier bind_rows to tidy form later. And not just non-default, but all args.

packages/rOpenScPCA/R/calculate-clusters.R Outdated Show resolved Hide resolved
@sjspielman sjspielman marked this pull request as ready for review September 9, 2024 20:47
@sjspielman sjspielman removed the request for review from jaclyn-taroni September 9, 2024 20:47
@sjspielman
Copy link
Member Author

Or we can just check that all the args have length 1, and not support args that don't follow that pattern, as I don't think we expect to use anything like that...

I haven't really been able to think of anything better than this... I guess we could also return the cluster_args list for users to do what they want with, but given that a minority of users (if any) are likely to be in this circumstance, I do think just the straight data frame return is cleaner.

@sjspielman
Copy link
Member Author

Ready for another look, and in particular interested in your feedback on my docs wording about parameter defaults, and overall wording for cluster_args length-1 restriction.

Copy link
Member

@jashapiro jashapiro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was actually suggesting something a bit different for the docs; consolidating the discussion of differences into the description. I wrote a bit of what I was thinking there, but you will probably want to rewrite it.

Comment on lines 5 to 6
#' Be aware that the default of "louvain" is different from the bluster package default of "walktrap". This difference is
#' because louvain clustering is more commonly-used in scRNA-seq analysis.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would actually leave off the explanation here. Instead, I was suggesting a paragraph of explanation before the @params, which will end up in the function description.

packages/rOpenScPCA/R/calculate-clusters.R Show resolved Hide resolved
@sjspielman
Copy link
Member Author

I wrote a bit of what I was thinking there, but you will probably want to rewrite it.

I think what you had there was mostly there tbh, so I expanded it a little bit with more specific bluster information but kept most of what you had.

Copy link
Member

@jashapiro jashapiro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Just a couple wording/format changes. Which will require rebuilding docs, unfortunately.

packages/rOpenScPCA/R/calculate-clusters.R Outdated Show resolved Hide resolved
packages/rOpenScPCA/R/calculate-clusters.R Outdated Show resolved Hide resolved
@sjspielman sjspielman merged commit 082ec20 into AlexsLemonade:feature/ropenscpca Sep 11, 2024
3 checks passed
@sjspielman sjspielman deleted the sjspielman/745-add-clustering-function branch September 11, 2024 17:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants