Ancestor lengths in genome and time bins #68

savitakartik · 2023-08-25T10:32:05Z

Addresses #20

Added child_left, child_right columns to nodes_df and tests for these.
Changed computation of ancestor-spans-heatmap data to avoid iterate over nodes instead of bins.

savitakartik · 2023-08-25T10:46:40Z

Some issues that remain with this PR:

will need to remove empty trees or set better xlim values, as there are quite a few empty bins in the heatmap currently displayed.
x and y bin sizes have been set arbitrarily at 1Mb and 1000 time units but this may not be ideal for other tree sequences. Will need to find a better way to set bin sizes.
We don't need to record all three of these columns in the nodes_df: ancestor_span, child_left and child_right since we're assuming contiguous node spans. Left them in for now as other plots use ancestor_span column.

savitakartik · 2023-08-25T10:49:43Z

model.py

+
+        num_x_wins = int(np.ceil(nodes_right.max() - nodes_left.min()) / win_x_size)
+        num_y_wins = int(np.ceil(nodes_time.max() / win_y_size))
+        heatmap_sums = np.zeros((num_x_wins, num_y_wins))


I'm creating a numpy 2D array here as it's easier to keep track of bin indices, and then flattening the array later. Is this acceptable?

Have you looked at using np.digitize to do the binning, rather than flooring things manually?

savitakartik · 2023-08-25T10:55:51Z

model.py

+            )  # map the node span to the x-axis bins it overlaps
+            x_end = int(np.floor(nodes_right[u] / win_x_size))
+            y = max(0, int(np.floor(nodes_time[u] / win_y_size)) - 1)
+            heatmap_sums[x_start:x_end, y] += min(


min() operation is only really required for the first and last bins as they may not completely overlap the node span. Every other bin can just be summed by the window size.

model.py

savitakartik · 2023-08-25T11:03:57Z

ancestor_span_heatmap.mp4

benjeffery · 2023-08-29T11:00:08Z

@savitakartik Would you like a review here now?

savitakartik · 2023-08-29T11:45:09Z

@savitakartik Would you like a review here now?

Yes that would be great, thanks @benjeffery.

jeromekelleher

Looks good - I wonder if we can use existing numpy tools a bit more though?

jeromekelleher · 2023-08-29T15:12:44Z

model.py

+
+        num_x_wins = int(np.ceil(nodes_right.max() - nodes_left.min()) / win_x_size)
+        num_y_wins = int(np.ceil(nodes_time.max() / win_y_size))
+        heatmap_sums = np.zeros((num_x_wins, num_y_wins))


Have you looked at using np.digitize to do the binning, rather than flooring things manually?

jeromekelleher · 2023-08-29T15:15:38Z

model.py

@@ -551,3 +555,48 @@ def calc_mutations_per_tree(self):
        mutations_per_tree = np.zeros(self.ts.num_trees, dtype=np.int64)
        mutations_per_tree[unique_values] = counts
        return mutations_per_tree
+
+    def compute_ancestor_spans_heatmap_data(self, win_x_size=1_000_000, win_y_size=500):


Are these sizes of genome coordinates? If so, it's going to be tricky to make default that work across species (compare humans to sars 2, eg.).

What you using number of x-bins and number of y-bins instead? That way you can be sure of a given level of resolution and that you won't use too much memory.

Can use np.linspace to generate the bin coordinates.

jeromekelleher · 2023-09-04T11:47:54Z

Can you rebase please @savitakartik to pull in the logging changes?

savitakartik · 2023-09-04T13:07:25Z

now done, @jeromekelleher. I will try to address the failing tests now.

jeromekelleher · 2023-09-04T13:08:12Z

I just looked at the plot for the Unified genealogy trees and it's not terribly informative. Let's have a chat about it tomorrow.

savitakartik · 2023-09-24T09:49:01Z

Recording.2023-09-24.104825.mp4

jeromekelleher · 2023-09-25T09:21:31Z

I think this is very useful, @benjeffery can you help getting it merge please?

benjeffery · 2023-09-26T11:24:45Z

@savitakartik Looks good - but I think you need to increment the cache version for nodes.

…unt on hover and plot titles added

jeromekelleher

LGTM - we can merge now and follow up with a PR to address potential performance issues later?

jeromekelleher · 2023-10-04T16:37:11Z

model.py

@@ -449,6 +449,10 @@ def nodes_df(self):
                "time": ts.nodes_time,
                "num_mutations": self.nodes_num_mutations,
                "ancestors_span": child_right - child_left,
+                "child_left": child_left,  # FIXME add test for this
+                "child_right": child_right,  # FIXME add test for this
+                "child_left": child_left,  # FIXME add test for this


Duplication here?

Also, looks like it's tested now?

jeromekelleher · 2023-10-04T16:38:50Z

model.py

+            x_ends = np.digitize(nodes_right, x_bins, right=True)
+            y_starts = np.digitize(nodes_time, y_bins, right=True)
+
+            for u in range(len(nodes_left)):


This is probably slow - should we do it with numba?

savitakartik commented Aug 25, 2023

View reviewed changes

model.py Outdated Show resolved Hide resolved

savitakartik marked this pull request as ready for review August 25, 2023 11:04

savitakartik force-pushed the ancestor_span_heatmap_v2 branch from 138b09e to 7106fcc Compare August 25, 2023 13:36

jeromekelleher reviewed Aug 29, 2023

View reviewed changes

savitakartik force-pushed the ancestor_span_heatmap_v2 branch from c778950 to 1b4b7ea Compare September 4, 2023 12:58

savitakartik force-pushed the ancestor_span_heatmap_v2 branch from 1b4b7ea to 5541b73 Compare September 23, 2023 18:45

savitakartik requested a review from benjeffery September 23, 2023 18:54

savitakartik added 4 commits October 2, 2023 11:04

Added child_left, child_right columns to nodes_df and tests for these.

fb483e2

int slider removed, edge case fixed, bin count logged, actual node co…

e86158c

…unt on hover and plot titles added

naively handling case for heatmap where time units are uncalibrated.

47536db

incremented nodes cache version

cb74ebc

savitakartik force-pushed the ancestor_span_heatmap_v2 branch from 5541b73 to 47536db Compare October 2, 2023 14:55

jeromekelleher reviewed Oct 4, 2023

View reviewed changes

savitakartik mentioned this pull request Oct 16, 2024

Plot average ancestor length in time and genome bins #20

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ancestor lengths in genome and time bins #68

Ancestor lengths in genome and time bins #68

savitakartik commented Aug 25, 2023 •

edited

Loading

savitakartik commented Aug 25, 2023 •

edited

Loading

savitakartik Aug 25, 2023

jeromekelleher Aug 29, 2023

savitakartik Aug 25, 2023

savitakartik commented Aug 25, 2023

benjeffery commented Aug 29, 2023

savitakartik commented Aug 29, 2023 •

edited

Loading

jeromekelleher left a comment

jeromekelleher Aug 29, 2023

jeromekelleher Aug 29, 2023

jeromekelleher commented Sep 4, 2023

savitakartik commented Sep 4, 2023

jeromekelleher commented Sep 4, 2023

savitakartik commented Sep 24, 2023

jeromekelleher commented Sep 25, 2023

benjeffery commented Sep 26, 2023

jeromekelleher left a comment

jeromekelleher Oct 4, 2023

jeromekelleher Oct 4, 2023

jeromekelleher Oct 4, 2023

Ancestor lengths in genome and time bins #68

Are you sure you want to change the base?

Ancestor lengths in genome and time bins #68

Conversation

savitakartik commented Aug 25, 2023 • edited Loading

savitakartik commented Aug 25, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

savitakartik commented Aug 25, 2023

benjeffery commented Aug 29, 2023

savitakartik commented Aug 29, 2023 • edited Loading

jeromekelleher left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jeromekelleher commented Sep 4, 2023

savitakartik commented Sep 4, 2023

jeromekelleher commented Sep 4, 2023

savitakartik commented Sep 24, 2023

jeromekelleher commented Sep 25, 2023

benjeffery commented Sep 26, 2023

jeromekelleher left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

savitakartik commented Aug 25, 2023 •

edited

Loading

savitakartik commented Aug 25, 2023 •

edited

Loading

savitakartik commented Aug 29, 2023 •

edited

Loading