-
Notifications
You must be signed in to change notification settings - Fork 95
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ufuncs and reductions #318
base: master
Are you sure you want to change the base?
Conversation
This looks very interesting, but I won't have time to review in the coming week. I'm sorry for the delay. |
Don't worry, I'm still finalizing the last part, but in the meanwhile some issues have also gone away magically :-) |
Bump already. This is actually in quite good shape now, only a few ufuncs missing and everything works apart from some documented cases and potentially corner cases that aren't tested for yet. I'll leave it at that for now and wait for a review. |
Maybe somebody can look into this at a point in the not-too-distant future? I'd also like to add that NumPy now exposes the Anyway, that would be a natural follow-up of this PR. |
Not this week. It is nips deadline and Arnaud isn't here, but maybe/probably next week. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've noted a couple of minor problems, but I can't help but feel the approach is somewhat off.
This mimics the very surface of the functionality that ufuncs offers. It does not support accumulate, reduce and others. It does not allow the user to define their own ufuncs. It doesn't support the where
and order
arguments. Most importantly it doesn't unify the interface.
I understand that your time is limited and you might not have the time to do all those things right now, but the current structure would make those very hard to add. If at least all the ufuncs where instances of a class, then we could modify that class later to provide those things.
pygpu/ufuncs.py
Outdated
def make_unary_ufunc(name, doc): | ||
def wrapper(a, out=None): | ||
return unary_ufunc(a, name, out) | ||
wrapper.__qualname__ = wrapper.__name__ = name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
__qualname__
doesn't exist in python 2
pygpu/ufuncs.py
Outdated
# Add the ufuncs to the module dictionary | ||
for ufunc_name in UNARY_UFUNCS: | ||
npy_ufunc = getattr(numpy, ufunc_name) | ||
descr = npy_ufunc.__doc__.splitlines()[2] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I like using numpy's documentation like this here. I think leaving these undocumented for now is probably better.
pygpu/ufuncs.py
Outdated
See Also | ||
-------- | ||
numpy.{} | ||
""".format(ufunc_name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if there is already a See Also
section?
src/gpuarray_buffer_cuda.c
Outdated
@@ -1025,6 +1025,8 @@ static int call_compiler(cuda_context *ctx, strb *src, strb *ptx, strb *log) { | |||
, "-G", "-lineinfo" | |||
}; | |||
nvrtcResult err; | |||
// DEBUG: print the kernel source | |||
// printf("%s\n", src); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove this
('rad2deg', 'radians'), | ||
('true_divide', 'divide'), | ||
('maximum', 'fmax'), # TODO: differ in NaN propagation in numpy, doable? | ||
('minimum', 'fmin'), # TODO: differ in NaN propagation in numpy, doable? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adjusting NaN propagation is doable with a bit of C code. If you don't want to do it, I'd rather we don't provide fmax/fmin than provide them with the wrong comportment.
Thanks for the quick review. I agree that some flexibility wouldn't hurt. Making the ufuncs class instances that implement Anyway, my approach here was to get the basic stuff working first and not let the amount of code get too much out of hand. Another way of playing this would be to just call into NumPy's function (e.g. I'll do the minimal fix and make ufuncs classes in any case. |
I made ufuncs class instances now, with a rather lame solution for ufuncs with different signatures (they are different classes; If you know a good solution for creating a callable class with I also added some native |
pygpu/ufuncs.py
Outdated
|
||
if need_context: | ||
ctx = get_default_context() | ||
cls = ndgpuarray # TODO: sensible choice as default? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is ok for now.
pygpu/ufuncs.py
Outdated
ctx = get_default_context() | ||
cls = ndgpuarray # TODO: sensible choice as default? | ||
|
||
# TODO: can CPU memory handed directly to kernels? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No it can't. Unless it's a scalar.
if a.flags.f_contiguous and not a.flags.c_contiguous: | ||
order = 'F' | ||
else: | ||
order = 'C' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can pass a
directly to array(). It will handle all the array-like things
pygpu/ufuncs.py
Outdated
need_context = False | ||
|
||
if need_context: | ||
ctx = get_default_context() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of doing this, it would be better to add an additional argument to the signature to specify the context.
You can make it a keyword argument to avoid major conflicts with numpy.
If no context is specified, the default context (if any) should automatically be used.
neutral = 'INFINITY' | ||
elif numpy.issubsctype(a.dtype, numpy.complexfloating): | ||
raise ValueError('array dtype {!r} not comparable' | ||
''.format(a.dtype.name)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just the else should be enough.
|
||
|
||
# This dictionary is derived from Numpy's C99_FUNCS list, see | ||
# https://github.com/numpy/numpy/search?q=C99_FUNCS |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Take care with this because the code you are generating is not C99, but CUDA or OpenCL. I'm pretty sure at least some of these will not be available, like cbrt
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So far I haven't found any that don't work, but that will probably depend on the CUDA version and may also be different for OpenCL.
In numpy the ufuncs are in C and they use the fact that python can't know the signature of C functions to trick the interpreter into thinking that there is a fixed number of arguments. In reality the functions accept any number of arguments and perform a check on them to limit the number according to the instance. In python it would be possible to "patch in" a call to have a differing signatures, but I don't think this is worth the effort. Also, if we don't have certain methods because they aren't available on the GPU (like at(), accumulate(), ...) then I'd rather they stay unimplemented for now rather than fallback to the CPU. This way it can at least attract attention to the fact that it needs more work if somebody needs this. |
jenkins test this please |
Circular dependency, needs fix |
pygpu/ufuncs.py
Outdated
self.accumulate.__name__ = self.accumulate.__qualname__ = 'accumulate' | ||
self.accumulate.__name__ = 'accumulate' | ||
if PY3: | ||
self.accumulate.__qualname__ = 'accumulate' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this done different than the others?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not intentionally, copy-paste error.
jenkins test this please |
1 similar comment
jenkins test this please |
It seems your code is not ready for python2. Make sure to test it on a python2 install somewhere. six can help with portability issues. |
There were some other issues. There is still a failure with |
Py2 failed due to some string issue, but there was an actual issue with |
Numpy does some strange things with bools in ufuncs, we just rely on C behavior. The thus failing tests are filtered out. Other tests for corner cases fail due to non-obvious handling of zero division and upcasting in Numpy, they're filtered out, too.
So here's version 1 of the ufuncs and reductions. Everything works as far as I can see, that means all ufuncs and reductions with all dtypes except float16.
In general, I have added a bunch of TODO notes in the code where I didn't quite know if there's a better way to implement something. Here are some more high-level TODOs:
Remaining open questions / TODOs:
Add ufuncs asProbably not, for this we have thendgpuarray
methods?__array_ufunc__
interface.pygpu
namespace?Pre-compile kernels for ufuncs and reductions and a selection of dtypes and ndims? That would make them much faster when called for the first time.This would be a new issue, too much for thei PR. Related to Disk cache #335.Their value is probably marginal so I didn't bother for now.This was easier than expected.Some notes on implementation details, problems and solutions.
Functions follow Numpy signature
All functions have the same signature as their Numpy counterparts, which is
ufunc(a[, out])
for unary ufuncs,ufunc(a, b[, out])
for binary ufuncs,reduction(a, axis=None, dtype=None, out=None, keepdims=False)
forsum, prod
andreduction(a, axis=None, out=None, keepdims=False)
foramin, amax
Array-like input
Problem 1:
All array arguments are allowed to be array-like, which has the consequence that in the extreme case when all arguments are not
GpuArray
instances, there is no object from which to get a context.Current solution: A default context must be defined in that case.
Problem 2: Which class should be chosen for the output?
Current solution:
ndgpuarray
Ufunc signatures are matched with those of Numpy ufuncs
The
ufunc_dtypes
helper looks at<ufunc>.types
and tries to find an input-output pair of dtypes that are adequate for the given input dtypes. Arrays are transformed to those types if necessary. This is a bit tricky forldexp
which needs a float and an integer array to work - that case is handled a bit hacky when the second argument is actually a scalar and not an array.Current failures
logaddexp
called with arguments-2
and a'uint16'
array results in afloat64
for our code and infloat32
in Numpy. This is because I usenumpy.result_type(a, b)
to determine the result dtype of both arguments, which yieldsnumpy.int64
in our case, which in turn triggers upcasting tonumpy.float64
.Not sure how to handle this.
Random notes
Edit: Question on license and remark on commit history added.
Edit2: Removed long-ish output comparison between
nose
andpytest
, and removed the license question again.Edit3: Section on problems with some ufuncs removed, that issue is solved.
Edit4: Test failures documented.
Edit5: TODO about test scope added.
Edit6: TODO about info dict added, otherwise removed some stuff.
Edit7: Updated
ndgpuarray
TODO