-
Notifications
You must be signed in to change notification settings - Fork 65
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create an Array interop crate #14
Comments
These requirements are subject to change per the discussion. I've chosen to create what I think are the most reasonable to get us started. We will call the conversion trait The basic requirements are:
Definition of Done:
|
|
Thanks for bringing up the topic. Let's make one thing clear first. Are we talking about interop for arrays (potentially multi-dimensional of many dimensions) or matrices (and possibly vectors)? Or stated differently, do we intend this to be an interop solution specifically for linear algebra or for storing data in multi-dimensional arrays? |
@Andlon: my understanding is that both 2D-Arrays and Matrixes typically store their data in the same way: a single long row-major contiguous vector. I think we should probably try to support BOTH row-major and column-major types as separate structs, possibly In that way it doesn't matter whether the library treats the data as a Matrix or an Array -- the data is the same, only the operations are different. Looking at it another way, the functions/methods of each library might be different, but they are operating on data that is guaranteed to be in the same structure (and is therefore interoperable with very little conversion cost). I think it is worthwhile to give a very, very basic overview of what the structs might look like (as I imagine them -- you are all more expert here than me):
Notice that the data itself is just a mutable pointer whos length can be determined by Also notice that this is the "simplest possible implementation." There are no type defined dimensions. The idea here is that creating/reading type dimensions introduces only a small cost and is worth it in order to have a common interop interface. Again, this is a proof of concept to get the ball rolling. Please provide feedback as I am NOT the expert here. |
I'm thinking of this as a conversion interface, so in terms of function parameters rather than a struct. Of course it could be seen as equivalent. We will want to be able to convert data structures without allocating a vec for the dimensions. Fixed dimensionality types want to be able to implement the trait in a way that identifies a source of data of a particular dimensionality. The common case is one and two dimensional data.
This is indeed the default memory layout of Rulinalg has MatrixSlice which supports a custom row stride. As of this writing to my knowledge, ndarray can always represent rulinalg's data layouts but not the other way around. rulinalg can perfectly represent the default layout of ndarray's 1D and 2D arrays. An illustration (picture) of how the strides work for the 2D case are in slide 7 and 8 of this old talk https://bluss.github.io/rust-ndarray/talk1/ I know there is no good accompanying description. This representation is used in linear algebra and other places to efficiently represent a cut-out from a larger matrix. (in ndarray, this is generalized to more dimensions than just two). |
@bluss I have made some edits to my post to include both Also, let me give a more full description by including a potential trait implementation
I believe if both array libraries simply implemented the conversion trait then users could always do something like the following to convert the type:
As far as I know (would love feedback!), the conversion can only fail if the types are different dimensions. Other errors (i.e. type errors, etc) would be caught by the compiler. Therefore the
|
I think it makes sense to have lossless conversion between different 2D data types. This is Rust so we should use types that properly convey the ownership (Either a custom type with ownership semantic in a pointer, or ndarray does not support type level properties that guarantee a particular memory layout: the conversions from an ndarray need to be fallible if they are not going to reallocate and copy data. The fallibilities in the proposal sounds inversed to me. Conversion from Row major array cannot fail, but into it can. Supporting just row major and col major in the conversion traits sounds good to me. Can we have a solution where we have gradually more type level information in the trait, depending on what the data structured involved support? Here's a rough grading, with excuses if I don't know all the libraries that well. (*) Dynamic dimensions and axes → fixed number of axes of dynamic length → some axes of fixed length →fixed number of axes of fixed lengths With example: Also, another scale: Dynamic memory layout → inner axis contiguous → entirely contiguous With example I think it is good enough to keep conversion on the level of fixed number of axes of dynamic length for example, or provide both that and the dynamic axes flexibility as an add-on. (*) Yet I don't want to involve typenum in this. Rust doesn't have a good solution to integer generic parameters, yet. |
I requires a
Ah, I hadn't thought of that use case... you're right, we should probably add another error condition where the data is not laid out correctly. Then that error would be part of the I'm a bit confused why you think it is "reversed" -- are the error conditions I listed not valid when converting from a RowMajorArray?
This is certainly possible, but I want to make sure I understand the use case. Having only one type would be:
I think having the following be compile time defined make sense:
These should be relatively simple and will have 8 structs and traits. I feel more than this might be biting off more than we can chew too quickly. My gut is that having up to 4 axes is the best way to go, at least until rust has compile-time integers. |
Also: I feel pretty adamant about supporting Personally I would use something like arrayvec to do the dirty work here. |
Saving an usize here while making dimensions a
I was thinking of that if a data type implements the trait FromRowMajorArray, then it can always represent row major arrays. (As long as the implementation can restrict itself to for example 2D data on the type level.) You're right though, it is more general to allow an error there. I do not think that supporting more type information means that we need 8 structs.
|
What should
I'm definitely not an expert in this -- would love to know how to generalize this! |
Sorry for not keeping up with the conversation, I'm a little busy these days. I think perhaps this is evolving into something more complicated than it needs to be, and it's also why I posed my previous question of whether we're talking about interop for linear algebra matrices and vectors or multi-dimensional arrays. Let's consider the need for such a crate in the current ecosystem. There are multiple crates that have custom storage for dynamically sized matrix-like structures: perhaps most prominently With this in mind, I suggest we keep it simple. What we need is to interop vectors and matrices, or equivalently 1D and 2D arrays. In the 1D array case, it can of course just be represented by e.g. a
I like the idea of having separate traits for column major and row major storage. Particularly because afaik, at the moment That, however, brings up an additional point. The point of this interop library is the ability to seamlessly work with matrix-like types from different libraries. However, the user experience would be horrible if we would straight up refuse to interop between two libraries because they don't support the same storage order. I believe that if zero-copy conversions cannot be done, we should still facilitate functionality so that data can be very easily copied (if needed) into the right format. However, we should make this functionality separate (i.e. separate traits) so that the user must opt-in to conversion with possible copy overhead, thus making it explicit. Of course, this only works for owned storage, and is not at all applicable to "views". Moreover, if we restrict ourselves to 1D and 2D arrays, we can also simplify the API by avoiding return of I know I've presented some very strong opinions in the above, and it's also possible I've misunderstood some of the above discussion. If I have made any mistakes or if you disagree with any of the opinions, please let me know! |
@Andlon thanks for the very thorough reply and I appologize for my absense. Had a more difficult time at work the last month and my wife and I are getting very close to delivering our second child... so expect there to be a gap in my participation in the near future as well. The original idea for this library was to provide generic ndimensional interop between libraries. The reasoning was simple: adding dimensional information is MASSIVELY more efficient than the data itself, so that is an acceptable overhead. However, having the type system check at least your dimensions is valuable and so I think the general consensus is to piviot to at least providing 1D, 2D and 3D arrays (so one more dimension than you suggest, but still the same idea). I also like the suggestion of focusing on Views -- I see views as essential for interop, as continually loosing ownership in order to enable interop would be extremely annoying. I also fully support being able to migrate from row-major -> column-major (and vice versa) within the library itself, but to have to do so explicitly. My personal preference would be to do this with a method (example: Unfortunately, we cannot always avoid With these design decisions I wouldn't be surprised if we were very close to attempting a library for interop support. Does anyone else have a major objection? |
I'm thinking this issue should probably be put on hold until |
This is based on the conversations referenced in #13
I want to try and have a conversation with the relevant crate authors/maintainers that create their own Array data types in rust. I would like to see if we can get an agreed upon basic data structure and API (Trait) for converting to/from the different types with zero* cost. This issue is intended to be the forum for those discussions. I'm hoping such an effort will be relatively easy and not limit the API's of each individual crate.
Once we get such a crate, we should document it on this site. I think telling the story of why there are different crates is really important because a lot of people expect rust numerical types have a standard in the same way as python does. If you can explain that the conversions are simple and basically free, then explain that they allow for useful features (like array multiplication with
*
) I think there will be a lot more support and the ecosystem can thrive with less friction.This thread is for discussion on this topic and is open to anyone. Particularily welcome are the numerical library crate maintainers.
The proposed name of the crate from me is
array-interop
The text was updated successfully, but these errors were encountered: