Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ufuncs and reductions #318

Open
wants to merge 29 commits into
base: master
Choose a base branch
from
Open

Ufuncs and reductions #318

wants to merge 29 commits into from

Conversation

kohr-h
Copy link
Contributor

@kohr-h kohr-h commented Dec 28, 2016

So here's version 1 of the ufuncs and reductions. Everything works as far as I can see, that means all ufuncs and reductions with all dtypes except float16.

In general, I have added a bunch of TODO notes in the code where I didn't quite know if there's a better way to implement something. Here are some more high-level TODOs:

Remaining open questions / TODOs:

  • Add ufuncs as ndgpuarray methods? Probably not, for this we have the __array_ufunc__ interface.
  • Expose ufuncs in the top-level pygpu namespace?
  • Pre-compile kernels for ufuncs and reductions and a selection of dtypes and ndims? That would make them much faster when called for the first time. This would be a new issue, too much for thei PR. Related to Disk cache #335.
  • The tests are currently not very exhaustive, they just use random positive arrays and check results against Numpy. The functions where it makes sense should also test mixed-sign arrays and arrays containing Inf and NaN. Also a test with the same arrays as input should be added for the comparison methods.
  • Which style should the docstrings adhere to? I used numpydoc style.
  • How to handle ufuncs with multiple outputs? Their value is probably marginal so I didn't bother for now. This was easier than expected.
  • Make a comprehensive info dictionary per ufunc for things like default output dtype (e.g. for logical), reorderable, known failures etc.
  • Remove doc assignment from numpy
  • Remove debug print of kernel source

Some notes on implementation details, problems and solutions.

Functions follow Numpy signature

All functions have the same signature as their Numpy counterparts, which is

  • ufunc(a[, out]) for unary ufuncs,
  • ufunc(a, b[, out]) for binary ufuncs,
  • reduction(a, axis=None, dtype=None, out=None, keepdims=False) for sum, prod and
  • reduction(a, axis=None, out=None, keepdims=False) for amin, amax

Array-like input

Problem 1:
All array arguments are allowed to be array-like, which has the consequence that in the extreme case when all arguments are not GpuArray instances, there is no object from which to get a context.
Current solution: A default context must be defined in that case.

Problem 2: Which class should be chosen for the output?
Current solution: ndgpuarray

Ufunc signatures are matched with those of Numpy ufuncs

The ufunc_dtypes helper looks at <ufunc>.types and tries to find an input-output pair of dtypes that are adequate for the given input dtypes. Arrays are transformed to those types if necessary. This is a bit tricky for ldexp which needs a float and an integer array to work - that case is handled a bit hacky when the second argument is actually a scalar and not an array.

Current failures

  • Some ufuncs have a different upcasting behavior from Numpy's when called with a negative scalar. For example, logaddexp called with arguments -2 and a 'uint16' array results in a float64 for our code and in float32 in Numpy. This is because I use numpy.result_type(a, b) to determine the result dtype of both arguments, which yields numpy.int64 in our case, which in turn triggers upcasting to numpy.float64.
    Not sure how to handle this.
  • Division by zero is not handled at all in this code, while Numpy does some checking and other magic. Maybe that's not very important.
  • Some other cases, everything documented in the test code.

Random notes

  • How do you handle contributed code?
  • The commit history is not cleaned up but rather a port from development out-of-tree here. The "honest" history is there. (I did that mostly due to better test error reporting, see below).

Edit: Question on license and remark on commit history added.
Edit2: Removed long-ish output comparison between nose and pytest, and removed the license question again.
Edit3: Section on problems with some ufuncs removed, that issue is solved.
Edit4: Test failures documented.
Edit5: TODO about test scope added.
Edit6: TODO about info dict added, otherwise removed some stuff.
Edit7: Updated ndgpuarray TODO

@abergeron
Copy link
Member

This looks very interesting, but I won't have time to review in the coming week. I'm sorry for the delay.

@kohr-h
Copy link
Contributor Author

kohr-h commented Apr 18, 2017

This looks very interesting, but I won't have time to review in the coming week. I'm sorry for the delay.

Don't worry, I'm still finalizing the last part, but in the meanwhile some issues have also gone away magically :-)
I'll bump this for a review sometime soon.

@kohr-h
Copy link
Contributor Author

kohr-h commented Apr 19, 2017

Bump already.

This is actually in quite good shape now, only a few ufuncs missing and everything works apart from some documented cases and potentially corner cases that aren't tested for yet. I'll leave it at that for now and wait for a review.

@kohr-h
Copy link
Contributor Author

kohr-h commented May 16, 2017

Maybe somebody can look into this at a point in the not-too-distant future?

I'd also like to add that NumPy now exposes the __array_ufunc__ interface, which means that a class that implements __array_ufunc__ gets to decide what to do when a NumPy ufunc is called on it. That's very good since somebody calling np.sin on a GPU array will then actually call the super-fast GPU elementwise kernel instead of casting to a NumPy array and then calling the NumPy implementation. (In another way that's bad since users may forget who's the workhorse :-) ).

Anyway, that would be a natural follow-up of this PR.

@nouiz
Copy link
Member

nouiz commented May 17, 2017

Not this week. It is nips deadline and Arnaud isn't here, but maybe/probably next week.

Copy link
Member

@abergeron abergeron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've noted a couple of minor problems, but I can't help but feel the approach is somewhat off.

This mimics the very surface of the functionality that ufuncs offers. It does not support accumulate, reduce and others. It does not allow the user to define their own ufuncs. It doesn't support the where and order arguments. Most importantly it doesn't unify the interface.

I understand that your time is limited and you might not have the time to do all those things right now, but the current structure would make those very hard to add. If at least all the ufuncs where instances of a class, then we could modify that class later to provide those things.

pygpu/ufuncs.py Outdated
def make_unary_ufunc(name, doc):
def wrapper(a, out=None):
return unary_ufunc(a, name, out)
wrapper.__qualname__ = wrapper.__name__ = name
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

__qualname__ doesn't exist in python 2

pygpu/ufuncs.py Outdated
# Add the ufuncs to the module dictionary
for ufunc_name in UNARY_UFUNCS:
npy_ufunc = getattr(numpy, ufunc_name)
descr = npy_ufunc.__doc__.splitlines()[2]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I like using numpy's documentation like this here. I think leaving these undocumented for now is probably better.

pygpu/ufuncs.py Outdated
See Also
--------
numpy.{}
""".format(ufunc_name)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if there is already a See Also section?

@@ -1025,6 +1025,8 @@ static int call_compiler(cuda_context *ctx, strb *src, strb *ptx, strb *log) {
, "-G", "-lineinfo"
};
nvrtcResult err;
// DEBUG: print the kernel source
// printf("%s\n", src);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove this

('rad2deg', 'radians'),
('true_divide', 'divide'),
('maximum', 'fmax'), # TODO: differ in NaN propagation in numpy, doable?
('minimum', 'fmin'), # TODO: differ in NaN propagation in numpy, doable?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adjusting NaN propagation is doable with a bit of C code. If you don't want to do it, I'd rather we don't provide fmax/fmin than provide them with the wrong comportment.

@kohr-h
Copy link
Contributor Author

kohr-h commented May 18, 2017

I've noted a couple of minor problems, but I can't help but feel the approach is somewhat off.

This mimics the very surface of the functionality that ufuncs offers. It does not support accumulate, reduce and others. It does not allow the user to define their own ufuncs. It doesn't support the where and order arguments. Most importantly it doesn't unify the interface.

I understand that your time is limited and you might not have the time to do all those things right now, but the current structure would make those very hard to add. If at least all the ufuncs where instances of a class, then we could modify that class later to provide those things.

Thanks for the quick review. I agree that some flexibility wouldn't hurt. Making the ufuncs class instances that implement __call__ shouldn't be a big deal.
On the other hand, I don't see how this keeps users from defining their own ufuncs. It's actually easier when they don't have to adhere to a specific class interface. In some sense, the elemwise helpers provide exactly that.

Anyway, my approach here was to get the basic stuff working first and not let the amount of code get too much out of hand.
Certainly ufunc.reduce seems relatively easily doable, maybe also outer, but for the rest accumulate, at, reduceat and the where argument I don't really have a good idea. The last 3 require index arrays with broadcasting and all, which is not trivial as far as I can tell. The accumulate function may be doable with some mako templating, but I assume it would be rather slow due to bad parallelism.

Another way of playing this would be to just call into NumPy's function (e.g. at) in case there is no native implementation, instead of leaving it unimplemented. An upside of that would be that people can write code that works, and which is slow now but becomes fast when someone sits down and implements the feature. In the other scenario, people would use the NumPy ufunc and never learn about the added native feature.

I'll do the minimal fix and make ufuncs classes in any case.

@kohr-h
Copy link
Contributor Author

kohr-h commented Jun 6, 2017

I made ufuncs class instances now, with a rather lame solution for ufuncs with different signatures (they are different classes; If you know a good solution for creating a callable class with __call__ signature changing per instance, please let me know.)

I also added some native ufunc.reduce methods. For all other methods (like accumulate) the code simply calls into the corresponding Numpy function. What's currently missing is a fallback for at, I'll add that.

pygpu/ufuncs.py Outdated

if need_context:
ctx = get_default_context()
cls = ndgpuarray # TODO: sensible choice as default?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is ok for now.

pygpu/ufuncs.py Outdated
ctx = get_default_context()
cls = ndgpuarray # TODO: sensible choice as default?

# TODO: can CPU memory handed directly to kernels?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No it can't. Unless it's a scalar.

if a.flags.f_contiguous and not a.flags.c_contiguous:
order = 'F'
else:
order = 'C'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can pass a directly to array(). It will handle all the array-like things

pygpu/ufuncs.py Outdated
need_context = False

if need_context:
ctx = get_default_context()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of doing this, it would be better to add an additional argument to the signature to specify the context.

You can make it a keyword argument to avoid major conflicts with numpy.

If no context is specified, the default context (if any) should automatically be used.

neutral = 'INFINITY'
elif numpy.issubsctype(a.dtype, numpy.complexfloating):
raise ValueError('array dtype {!r} not comparable'
''.format(a.dtype.name))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just the else should be enough.



# This dictionary is derived from Numpy's C99_FUNCS list, see
# https://github.com/numpy/numpy/search?q=C99_FUNCS
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Take care with this because the code you are generating is not C99, but CUDA or OpenCL. I'm pretty sure at least some of these will not be available, like cbrt.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So far I haven't found any that don't work, but that will probably depend on the CUDA version and may also be different for OpenCL.

@abergeron
Copy link
Member

In numpy the ufuncs are in C and they use the fact that python can't know the signature of C functions to trick the interpreter into thinking that there is a fixed number of arguments.

In reality the functions accept any number of arguments and perform a check on them to limit the number according to the instance.

In python it would be possible to "patch in" a call to have a differing signatures, but I don't think this is worth the effort.

Also, if we don't have certain methods because they aren't available on the GPU (like at(), accumulate(), ...) then I'd rather they stay unimplemented for now rather than fallback to the CPU. This way it can at least attract attention to the fact that it needs more work if somebody needs this.

@abergeron
Copy link
Member

jenkins test this please

@kohr-h
Copy link
Contributor Author

kohr-h commented Jun 12, 2017

Circular dependency, needs fix

pygpu/ufuncs.py Outdated
self.accumulate.__name__ = self.accumulate.__qualname__ = 'accumulate'
self.accumulate.__name__ = 'accumulate'
if PY3:
self.accumulate.__qualname__ = 'accumulate'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this done different than the others?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not intentionally, copy-paste error.

@abergeron
Copy link
Member

jenkins test this please

1 similar comment
@abergeron
Copy link
Member

jenkins test this please

@abergeron
Copy link
Member

It seems your code is not ready for python2. Make sure to test it on a python2 install somewhere.

six can help with portability issues.

@kohr-h
Copy link
Contributor Author

kohr-h commented Jun 12, 2017

There were some other issues. There is still a failure with ldexp (doesn't raise where it should), will look into it.

@kohr-h
Copy link
Contributor Author

kohr-h commented Jun 12, 2017

Py2 failed due to some string issue, but there was an actual issue with deg2rad, missing precision. I also added an optional context argument to everything and simplified the ufunc wrapper a bit. Still a bit of remaining work, but we're getting closer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants