Need for dealing with strides when writing performant code against multiple array libraries #782
Replies: 8 comments
-
Having a look at the usage frequency of Scikit-learn: $ rg strides --type py
sklearn/datasets/tests/test_samples_generator.py
135: signs = signs.view(dtype="|S{0}".format(signs.strides[0]))
sklearn/feature_extraction/image.py
284: """Extracts patches of any n-dimensional array in place using strides.
326: patch_strides = arr.strides
329: indexing_strides = arr[slices].strides
336: strides = tuple(list(indexing_strides) + list(patch_strides))
338: patches = as_strided(arr, shape=shape, strides=strides)
$ rg as_strided --type py
sklearn/feature_extraction/image.py
16:from numpy.lib.stride_tricks import as_strided
338: patches = as_strided(arr, shape=shape, strides=strides) SciPy: $ rg as_strided --type py
signal/_spectral_py.py
1958: result = np.lib.stride_tricks.as_strided(x, shape=shape,
linalg/_special_matrices.py
5:from numpy.lib.stride_tricks import as_strided
218: return as_strided(vals[len(c)-1:], shape=out_shp, strides=(-n, n)).copy()
259: return as_strided(c_ext[L-1:], shape=(L, L), strides=(-n, n)).copy()
316: return as_strided(vals, shape=out_shp, strides=(n, n)).copy()
signal/_signaltools.py
2124: zi = np.lib.stride_tricks.as_strided(zi, expected_shape,
signal/tests/test_signaltools.py
445: a = np.lib.stride_tricks.as_strided(a, shape=(n, 1000), strides=(8008, 8))
linalg/special_matrices.py
12: 'fiedler', 'fiedler_companion', 'convolution_matrix', 'as_strided'
interpolate/tests/test_fitpack.py
347: from numpy.lib.stride_tricks import as_strided
354: x = as_strided(np.zeros(()), shape=(size,)) There's some more usages of
|
Beta Was this translation helpful? Give feedback.
-
Interestingly enough the second of the It may be insightful to take the |
Beta Was this translation helpful? Give feedback.
-
Strides also come up a lot in broadcast implementations: >>> a = np.arange(2*3*5).reshape(2, 3, 5)
>>> b = np.array(100)
>>> aprime, bprime = np.broadcast_arrays(a, b)
>>> bprime.shape
(2, 3, 5)
>>> bprime.strides
(0, 0, 0) I'm not sure if this is relevant (because it's an implementation), but in case it is, this is another big way that non-trivial strides enter into array libraries without JIT-compilation/operator fusing. |
Beta Was this translation helpful? Give feedback.
-
I don't think so, broadcasting is a separate topic from strides. Only the array shape is relevant to determine broadcasting. For strided array libraries, they do have certain behavior regarding the strides for broadcast arrays, as you pointed out. But, all array libraries, strided or not, can support broadcasting, and thus it's already included in the standard. FYI this issue has been discussed in the array API meeting this week 🙂 Someone will write up a summary later, so I wouldn't dare attempt to give a poor summary in my own words. But I do just come up with another reason against standardizing strides. Strides are meaningful only for dense arrays/tensors. Suppose I wanna write an multidimensional array library that is array-API compliant and has dual dense/sparse support under the hood, without users making any decision if an array should be stored in the dense or any of the known sparse format (COO, CSR, ..., it gets complicated for N-D tensors). Then, clearly I simply cannot return strides, because it is an implementation detail of my library that most users should not need to know/rely on. |
Beta Was this translation helpful? Give feedback.
-
I agree with what @leofang says. Everything up to
and it's very rarely needed to explicitly access or manipulate strides. |
Beta Was this translation helpful? Give feedback.
-
I suspect all of those uses of strides can be written using In either case, the point is that Memory order can be a useful to ensure or hint (it even makes some sense for sparse as CSC and CSR are basically the same thing as F vs. C). |
Beta Was this translation helpful? Give feedback.
-
I like this idea—collecting the use-cases that strides have been used for and defining APIs for them instead of the strides themselves. (With 18 years of NumPy and even more for the predecessors, we must have seen all the use-cases by now.) This originally came up in the context of the paper for SciPy—indicating this as a possible direction would satisfy the needs of the paper. (Maybe after more of the group signs off on it—or maybe you're already in agreement on this and I'm just slow to catch on.) |
Beta Was this translation helpful? Give feedback.
-
Not a summary, but one point I made there and I should probably mention here too, is that for the SciPy |
Beta Was this translation helpful? Give feedback.
-
Early on when putting the array API standard together, we made a decision that strides must not be exposed in the Python API, because multiple array libraries don't support that and it's memory layout which is more implementation detail than an API that should be exposed to Python users. We made that decision so early that there isn't even a dedicated issue for it I believe; the closest one is gh-571 about C vs. Fortran memory order. It's also related to gh-24 (strides are mentioned explicitly there).
The strides topic came up a couple of times recently though, and we should think about this more:
as_strided
: RFC: SciPy array types & libraries support scipy/scipy#18286 (comment)Previous comments
I'll copy some of the key comments below to have them in a single place:
From @jpivarski: The thing I was most surprised about in the read-through was that "strides" are not considered part of the array object. I recognize that some libraries (e.g. JAX) are able to do great things by not allowing strides to be free. But then, you show in the SciPy benchmark (Fig. 2c, 2d) that not having this handle implies a severe performance penalty, which is likely to be a major take-away from this paper by most readers. In the discussion, you say that it illustrates that users will need to apply library-specific optimizations after all. (Implementing a backend or runtime switching system is non-goal number 2.) It seems to me like strides-awareness would be a good candidate for one of the optional extensions, if extensions are not strictly sets of functions. Perhaps that's a question or suggestion I should take up with the Consortium's normal channels, not a review of the paper, but it's a question the paper leads me to have.
From @leofang: Regarding strides: I have conflicting views and will continue to self-debate. On the one hand, when over half of the array libraries (Jax, TF, Dask, cuNumeric) do not offer strides for various reasons (ex: it makes no sense for distributed arrays), it's my opinion that even making it optional is not the right way to go, just like many other design considerations that we've made. Also, for those supporting the stride semantics, their strides are accessible via DLPack, it's just that currently it's hard to access them from within pure Python. On the other hand, as a low-level C++ library developer (in short, we write backends for the array libraries), I do see the pain point that it's hard for us to even imagine how our library would be adopted and consumed by the array libraries that do not have strides. It could be possible that our library simply cannot serve as the battery behind them due to different design philosophies, idk. Though, to be fair, the standard aims at the array libraries, not their backends, so if there's a standardization effort targeting array backends, it might be a better place for standardizing strides.
From @tylerjereddy:
scipy.signal.welch
seemed to be a well-behaved target for prototyping based on that so I tried doing that from scratch on a feature branch locally, progressively allowing more welch() tests to accept CuPy arrays when an env variable is set.Even for this "well-behaved"/reasonable target, I ran into problems pretty early on. For example, both in the original feature branch and in my branch, there doesn't seem to be an elegant solution made for handling numpy.lib.stride_tricks.as_strided. The docs for that function don't even recommend using it, and yet CuPy (and apparently torch from the Quansight example) do provide it, outside of the array API standard proper.
So, I guess my first real-world experience makes me wonder what our policy on special casing in these scenarios will be--ideally, I'd like to just remove the usage of as_strided() and substitute with some other approach that doesn't require conditionals on the exact array type/namespace. While this is a rather specific blocker, if I encounter something like this even for a "well behaved" case, I can imagine quite a few headaches for the cases that are not well behaved.
From @rgommers: Let's have a look at this case - this is the code in question:
The
as_strides
usage is there to save memory. It's actually pretty difficult to understand what the code does exactly though, so it's good for performance but not great for maintainability. <.... example with more details cut, see here> At that point, we can say we're happy with more readability at the cost of some performance (TBD if the for-loop matters, my guess is it will). Or, we just keep the special-casing for numpy usingas_strided
. At that point, we do have two code paths, but no performance loss and also easier to understand code.To discuss
It's clear that strides cannot be universally supported, and equally clear that users who write algorithmic code may need to access strides in some cases to avoid significant performance loss.
I think essentially, dealing with strides by hand for libraries that allow that is doing the work that in a more full-featured array library will/should be done by a compiler (e.g. the JIT/AOT compilers in JAX or PyTorch).
We need to at least suggest and document some strategy for cases like the
as_strided
usage above. In the similar but more specifc case oforder=
, which mattered quite a bit performance-wise for scikit-learn, they ended up writing a library-specific utility function to use theorder
keyword when present, and avoid it otherwise: https://github.com/scikit-learn/scikit-learn/blob/21312644df0a6b4c6f3c27a74ac9d26cf49c2304/sklearn/utils/_array_api.py#L360Beta Was this translation helpful? Give feedback.
All reactions