Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add rOpenScPCA function to perform graph-based clustering #745

Closed
sjspielman opened this issue Sep 3, 2024 · 3 comments
Closed

Add rOpenScPCA function to perform graph-based clustering #745

sjspielman opened this issue Sep 3, 2024 · 3 comments
Assignees
Labels

Comments

@sjspielman
Copy link
Member

sjspielman commented Sep 3, 2024

If you are filing this issue based on a specific GitHub Discussion, please link to the relevant Discussion.

#727 & #731

Describe the goals of the changes to the analysis module.

This issue tracks adding a function to perform graph-based clustering to rOpenScPCA.

  • (EDIT!) The function should take a matrix with row names where rows are assumed to be genes and columns are assumed to be cells. rows are assumed to be cells and columns are PCs(!)
  • The function should take the following arguments:
    • algorithm (default louvain)
    • weighting (default jaccard)
    • k
    • number of threads
    • seed
    • optionally, a TSV file to export results to. If NULL, nothing will be exported. (see below for TSV contents...)
  • The function should return a table of results with these columns (which would be the same as what is exported to TSV if specified): barcode, cluster, algorithm, weighting.

What will your pull request contain?

A function with associated docs & tests to perform graph-based clustering

Will you require additional software beyond what is already in the analysis module?

N/A

Will you require different computational resources beyond what the analysis module already uses?

N/A

If known, when do you expect to file the pull request?

Unless something else comes up, today!

@jashapiro
Copy link
Member

I think it would be good to make sure that this function is flexible enough to handle other arguments to the clustering algorithm, which may be buried in the main documentation. In particular, I am thinking about louvain clustering which has a resolution parameter see cluster_louvain, and for leiden clustering there are a few more important parameters: https://r.igraph.org/reference/cluster_leiden.html; importantly the objective_function has a strong effect on results.

The minimal version of this would be to directly pass in cluster.args to the param object creation, but it would probably make sense to do some specific checking around those.

Finally, I would not make tsv export part of this function. Exporting a data frame is standard enough to be handled by the user with their preferred tools.

@sjspielman
Copy link
Member Author

sjspielman commented Sep 3, 2024

The function should take a matrix with column names where rows are assumed to be genes and columns are assumed to be cells.

This might actually not be what we want. If the goal is to have something flexible enough for SCE & Seurat users to dive into, it might be better to allow users to pass in either a matrix, an SCE object, or a Seurat object, and the function would pull out the matrix for the latter two types. This means we'll need some helper functions.

For now, I'll continue assuming* a matrix is being passed in, but this is something to either re-consider as a separate issue or as part of review.

@sjspielman
Copy link
Member Author

Closed by #749

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants