Histogram histogram division #302

henryiii · 2020-01-15T14:58:32Z

Dividing histograms by histograms behaves oddly. It should either not be supported, or work, but not misbehave.

h / h
# TypeError: Only axes supported in histogram **constructor**

This is triggering the wrong constructor

h /= h
# Does nothing at all???

And, finally, here is the current workaround, mentioned on Gitter:

# If an array output is needed
arr = h1.view() / h2.view()

# If a histogram output is required
h3 = h1.copy()
h3[...] = h1.view(flow=True) / h2.view(flow=True)

nsmith- · 2020-01-20T00:08:43Z

I think it should definitely require at least the workaround, because a ratio histogram is not a histogram: two ratios cannot be added. Unless we want to implement a container that stores numerator and denominator (my preferred technique is just to add a category axis to store each) and operates like a ratio for certain methods like getting bin content. That's probably something best left to hist.

henryiii · 2020-01-20T16:11:36Z

I expect you are correct; the same thing comes up for density, which still should have axis information, but really is not a histogram. However, the current error message and silent failure needs to be improved.

HDembinski · 2020-01-23T08:27:59Z

tl;dr: The histogram object is not necessarily semantically a histogram. You can assign arbitrary values to a histogram object, so the library cannot offer a guarantee that the values in the histogram actually represent counts.

Holding a density inside a histogram object is acceptable, although not the typical use case. My original design restricted the interface so that you cannot set cell values at all, so that a histogram can only be filled with the .fill method. This would guarantee that the values in histogram cells are really counts. But we gave up on that for practical reasons. You can assign arbitrary values to histogram cells, hence the semantic meaning of these values is arbitrary. In Boost.Histogram (C++), you can divide two histograms and that should work in boost-histogram as well. Semantically, the result of this operation is not a histogram. Neither is multiplying histograms.

henryiii · 2020-01-24T04:25:32Z

Okay, sounds good. I think the best way to handle this is to enable operations on storages, then to use those for h / h2 rather than calling the boost-histogram divide, as that will avoid increasing the compile-time and will allow different storage types to be divided.

henryiii · 2020-01-24T04:26:08Z

PS: I'm on vacation and will be back next week; development from my side has been minimal this week.

HDembinski · 2020-01-31T20:09:11Z

I think the best way to handle this is to enable operations on storages, then to use those for h / h2 rather than calling the boost-histogram divide

I am against this. boost-histogram should use the boost.histogram facilities as much as possible, unless there is functionality missing in boost.histogram. The extra indirection for calling division on histograms instead of storages is minimal. Just make a ticket in boost.histogram if dividing histograms with different storages does not work.

henryiii · 2020-01-31T20:11:28Z

The problem is that you have to compile every possible combination for each of these for each storage, giving N^2 methods for N storages for each operation.

HDembinski · 2020-01-31T20:13:56Z

The N^2 problem is not solved by doing that directly on the storages.

HDembinski · 2020-01-31T20:16:12Z

You can try it out both ways, but I think you don't safe much in compile-time, however, you are reducing code-reuse between boost-histogram and boost.histogram.

HDembinski · 2020-01-31T20:18:57Z

I looked into the code, boost.histogram already supports dividing histograms with arbitrary storages.

henryiii · 2020-01-31T20:19:08Z

It somewhat is, since you don't compile many of the operations. You still have to make sure they work, but numpy provides most of them for free (int / double, int / int, etc). The problem is really just 4^2 then, since you have to provide double + the three accumulator storages. At that point, hopefully you could reuse Boost.Histogram's operations. I assume Boost.Histogram knows how to multiply/divide accumulators?

HDembinski · 2020-01-31T20:20:28Z

It does if the accumulators support it.

HDembinski · 2020-01-31T20:23:04Z

Boost.Histogram also has rules to select which storage the resulting histogram should have. If you divide int by double, it uses double for the storage type of the result, for instance. If you don't use the boost.histogram code, you have to reimplement this.

henryiii · 2020-01-31T20:23:47Z

I'm stuck in writing now, but when I'm done, I want to work on #276, once that's in we can see if this is solvable through that or if we need to implement the N^2 methods. We can also try writing the mp_11 code for one of the N^2. And as long as it works, users don't even need to know if the backend changes.

I looked into the code, boost.histogram already supports dividing histograms with arbitrary storages.

Yes, I know. :)

HDembinski · 2020-01-31T20:30:00Z

Ok, fine, you can probably solve this with a numpy-array view for some storages, then you rely on the pre-compiled numpy code to handle all the code paths. If this is easier to get the feature working, then I am not against. Like you said, implementation can be changed at any time if needed.

That being said, I am proud of the operator implementation in C++, which should handle everything you can conceivably throw at it. This was not easy.

henryiii · 2020-01-31T20:33:56Z

I am by no means convinced this is the right way to go in the end, it's just the path I think will be easiest to start with, and I want #276 working anyway so I'm hoping to piggyback on that work, much like we used the view setting/accessing to work around at-that-time missing features in Boost.Histogram. And I'm hoping we can keep compiling under control.

I am also not at all discrediting the operator implementation Boost.Histogram; as I've said I've looked at it before and have been quite impressed with it. :)

HDembinski · 2020-05-29T13:16:29Z

Coming back to this, I think this should be implemented at least superficially, so that h / h does not give an error. Something that can be done on the pure Python level (demo-code only, did not check if works):

# method on histogram
def __rtruediv__(self, rhs):
    if isinstance(rhs, <some number>):
         v /= rhs
    elif isinstance(rhs, <a histogram>) and self.axes == rhs.axes:
         v = self.view(True)
         v /= rhs.view(True)
    else:
         raise TypeError
    return self

The divide operator can make a copy and use rtruediv. This is essentially what the C++ code does and there is no additional magic in the C++ code, so we might as well implement division like this at zero additional compile-time cost. I give up my earlier position in this case that boost-histogram should always use the underlying C++ code. For such simple stuff it does not have to.

henryiii · 2020-05-29T13:18:38Z

Agreed, that's a good point. I'll move it to a 0.8.0 milestone, but with the caveat that it might only be the simple workaround for now, possibly to be improved later.

HDembinski · 2020-05-29T13:20:49Z

This pushes the responsibility to implement division properly into the view, which is ok, I think because we said elsewhere that we want the view of a storage to support the same operators as the histogram.

henryiii · 2020-05-29T13:57:42Z

In #276, I believe.

LovelyBuggies · 2020-07-15T03:44:48Z

@henryiii I think this is misbehavior of division if we fill with a 10-element array and divide it by 1000...

h.fill(np.random.normal(size=10))
h = h/1_000_000

Then the counts will be double typed. For example, h[0] will be a small float number if not zero. This seems doesn't make sense.

henryiii · 2020-07-15T04:39:37Z

It's still a float:

>>> h = bh.Histogram(bh.axis.Regular(10,-3,3))
>>> h.fill(.5)
>>> h2 = h/1_000_000
>>> h2[5]
1e-06
>>> h2[4]
0.0
>>> type(h2[4])
<class 'float'>
>>> type(h2[5])
<class 'float'>

These are all floats. Unless I misunderstood your comment.

Mixing types here is actually quite hard to do.

LovelyBuggies · 2020-07-15T04:43:35Z

~~Yep, but what's the meaning of float counts? Counts should only be integers, shouldn't they?~~ My brain is screwed up, I just realized we have weight argument 😅

henryiii · 2020-07-15T04:52:48Z

Weighted fills need ... yes. :)

You can set the Int64 storage if you truly want ints.

henryiii changed the title ~~Histogram division~~ Histogram histogram division Jan 15, 2020

henryiii added this to the 1.0.0 milestone May 29, 2020

henryiii modified the milestones: 1.0.0, 0.8.0 May 29, 2020

henryiii mentioned this issue Jul 6, 2020

feat: Support hist/hist division if view supports it #393

Merged

henryiii closed this as completed in #393 Jul 6, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Histogram histogram division #302

Histogram histogram division #302

henryiii commented Jan 15, 2020

nsmith- commented Jan 20, 2020

henryiii commented Jan 20, 2020

HDembinski commented Jan 23, 2020

henryiii commented Jan 24, 2020

henryiii commented Jan 24, 2020

HDembinski commented Jan 31, 2020

henryiii commented Jan 31, 2020

HDembinski commented Jan 31, 2020

HDembinski commented Jan 31, 2020 •

edited

Loading

HDembinski commented Jan 31, 2020

henryiii commented Jan 31, 2020

HDembinski commented Jan 31, 2020

HDembinski commented Jan 31, 2020

henryiii commented Jan 31, 2020

HDembinski commented Jan 31, 2020 •

edited

Loading

henryiii commented Jan 31, 2020 •

edited

Loading

HDembinski commented May 29, 2020 •

edited

Loading

henryiii commented May 29, 2020

HDembinski commented May 29, 2020

henryiii commented May 29, 2020

LovelyBuggies commented Jul 15, 2020 •

edited

Loading

henryiii commented Jul 15, 2020

LovelyBuggies commented Jul 15, 2020 •

edited

Loading

henryiii commented Jul 15, 2020

Histogram histogram division #302

Histogram histogram division #302

Comments

henryiii commented Jan 15, 2020

nsmith- commented Jan 20, 2020

henryiii commented Jan 20, 2020

HDembinski commented Jan 23, 2020

henryiii commented Jan 24, 2020

henryiii commented Jan 24, 2020

HDembinski commented Jan 31, 2020

henryiii commented Jan 31, 2020

HDembinski commented Jan 31, 2020

HDembinski commented Jan 31, 2020 • edited Loading

HDembinski commented Jan 31, 2020

henryiii commented Jan 31, 2020

HDembinski commented Jan 31, 2020

HDembinski commented Jan 31, 2020

henryiii commented Jan 31, 2020

HDembinski commented Jan 31, 2020 • edited Loading

henryiii commented Jan 31, 2020 • edited Loading

HDembinski commented May 29, 2020 • edited Loading

henryiii commented May 29, 2020

HDembinski commented May 29, 2020

henryiii commented May 29, 2020

LovelyBuggies commented Jul 15, 2020 • edited Loading

henryiii commented Jul 15, 2020

LovelyBuggies commented Jul 15, 2020 • edited Loading

henryiii commented Jul 15, 2020

HDembinski commented Jan 31, 2020 •

edited

Loading

HDembinski commented Jan 31, 2020 •

edited

Loading

henryiii commented Jan 31, 2020 •

edited

Loading

HDembinski commented May 29, 2020 •

edited

Loading

LovelyBuggies commented Jul 15, 2020 •

edited

Loading

LovelyBuggies commented Jul 15, 2020 •

edited

Loading