Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial version of clustering that uses the conv filter activations #44

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

AvantiShri
Copy link
Collaborator

@AvantiShri AvantiShri commented Aug 15, 2019

Builds on feature added in https://github.com/kundajelab/tfmodisco/releases/tag/v0.5.2.0. Data tracks supplied via the other_tracks argument can now be used for calculating the affinity matrix. To use the feature, supply the names of the relevant tracks to the tracknames_to_use_for_embedding argument. This will use those tracks to derive the seqlet embeddings (as opposed to using the gapped kmer embedding, which is what is done in the other workflow). The embedding for each seqlet is created by summing the value of each channel in the concatenated data tracks across the length of the seqlet. The cosine similarity between embeddings is used to create the affinity matrix, which (as with the other workflow) is density-adapted and supplied to multiple rounds of Louvain. There is no separate "fine-grained" affinity matrix calculation, because the fine-grained affinity matrix calculation was specifically used because the affinity matrix derived from the gapped-kmer embedding was considered too coarse-grained. Downstream post-processing remains the same - in particular, input-level importance scores are still used to align seqlets within a cluster and to split/merge clusters in the post-processing phase.

A notebook demonstrating the feature is at https://github.com/kundajelab/tfmodisco/blob/fef7d28480ee88236dee0cd3d3660b07f566e0f7/test/nb_test/talgata/TF%20MoDISco%20TAL%20GATA%20with%20Activations.ipynb

@tchiruvolu can you try applying this to the APA dataset?

@AvantiShri
Copy link
Collaborator Author

Aside in case you are confused why the commit messages say "Anon": my github username settings are set to "Anon" because I have to maintain anonymity for some other repos where the paper is subject to double-blind review. Don't want to accidentally commit to those repos under my actual name, so I find it safer to leave the username as "Anon".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant