Initial version of clustering that uses the conv filter activations #44

AvantiShri · 2019-08-15T20:20:04Z

Builds on feature added in https://github.com/kundajelab/tfmodisco/releases/tag/v0.5.2.0. Data tracks supplied via the other_tracks argument can now be used for calculating the affinity matrix. To use the feature, supply the names of the relevant tracks to the tracknames_to_use_for_embedding argument. This will use those tracks to derive the seqlet embeddings (as opposed to using the gapped kmer embedding, which is what is done in the other workflow). The embedding for each seqlet is created by summing the value of each channel in the concatenated data tracks across the length of the seqlet. The cosine similarity between embeddings is used to create the affinity matrix, which (as with the other workflow) is density-adapted and supplied to multiple rounds of Louvain. There is no separate "fine-grained" affinity matrix calculation, because the fine-grained affinity matrix calculation was specifically used because the affinity matrix derived from the gapped-kmer embedding was considered too coarse-grained. Downstream post-processing remains the same - in particular, input-level importance scores are still used to align seqlets within a cluster and to split/merge clusters in the post-processing phase.

A notebook demonstrating the feature is at https://github.com/kundajelab/tfmodisco/blob/fef7d28480ee88236dee0cd3d3660b07f566e0f7/test/nb_test/talgata/TF%20MoDISco%20TAL%20GATA%20with%20Activations.ipynb

@tchiruvolu can you try applying this to the APA dataset?

…ions

AvantiShri · 2019-08-15T20:27:23Z

Aside in case you are confused why the commit messages say "Anon": my github username settings are set to "Anon" because I have to maintain anonymity for some other repos where the paper is subject to double-blind review. Don't want to accidentally commit to those repos under my actual name, so I find it safer to leave the username as "Anon".

Anon added 3 commits August 14, 2019 19:13

refactored to make it easier to user different clustering

26032c6

initial implementatio of clustering that uses the conv filter activat…

d06e3e4

…ions

seqlets_to_patterns updated too

fef7d28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Initial version of clustering that uses the conv filter activations #44

Initial version of clustering that uses the conv filter activations #44

AvantiShri commented Aug 15, 2019 •

edited

Loading

AvantiShri commented Aug 15, 2019

Initial version of clustering that uses the conv filter activations #44

Are you sure you want to change the base?

Initial version of clustering that uses the conv filter activations #44

Conversation

AvantiShri commented Aug 15, 2019 • edited Loading

AvantiShri commented Aug 15, 2019

AvantiShri commented Aug 15, 2019 •

edited

Loading