-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[R]: geoarrow_vctr class #70
Comments
I can make a start on this piece if you're taking PRs. Are we implementing a Should |
I would love a PR! The basic minimum is something that can be put into a I would probably use I think |
Are you still interested in having a stab at this? I have some time to take a look at this this week and am happy to give it a go (but also happy to defer to any work you've done already if you're interested 🙂 ) |
I had intended to action this (almost) 2 weeks ago. I am still interested in having a stab at this, but I've been focused on other projects. I haven't implemented anything yet. |
No worries! I haven't gotten to it yet either. Just post a note here if/when you do (I'll do the same). |
Should Or, should # untested example for wk-like
# need a suitable default for x
geoarrow <- function(x = as_geoarrow_array(wk::wkb()), crs = wk::wk_crs_auto(), geodesic = FALSE) {
validate_geoarrow_array(x) # doesn't exist
new_geoarrow(x)
}
new_geoarrow <- function(x = as_geoarrow_array(wk::wkb()), crs = wk::wk_crs_auto(), geodesic = FALSE) {
structure(x, class = c("geoarrow_vctr", "wk_vctr"), crs = crs, geodesic = geodesic)
}
as_geoarrow <- function(x) {
new_geoarrow(as_geoarrow_array(x), wk::wk_crs(x), wk::wk_is_geodesic(x))
} |
I think the stricter option is better, here, or you could omit the function since I don't know exactly how it would get used. It might be idiomatic to do something like |
I agree. A couple of usecases for a
A little off topic: a minimal implementation should be printable. I think we need to subset the geoarrow_vctr to achieve this without wasting a lot of memory? |
Those uses, do make sense...maybe it should be restricted to creating just a zero-size version for the "make me a ptype" use case?
It would require a slice, anyway. You can use |
Ok, #75 . It's almost all about how to resolve a chunk using a binary search and various ways to access that. Maybe it would help if I polished up that and removed the vctr bits? |
Maybe. I don't have a strong opinion on it. This is the same pattern used in s2 and geos.
I skimmed and saw only 1 trivial bug that can't be reached with the public api currently. I don't know that we need to remove the vctr bits. We can build from there |
How should we deal with non-contiguous slices? We'll need this for data.frame manipulation. Should we shallow copy, allowing indices to be any integer vector within range? I think this could potentially create many Or should we clone the slice? vec <- geoarrow::as_geoarrow_vctr(wk::xy(1:5, 1:5))
vec[c(1, 3)]
#> Error in `[.geoarrow_vctr`(vec, c(1, 3)): Can't subset geoarrow_vctr with non-slice (e.g., only i:j indexing is supported) Created on 2023-12-05 with reprex v2.0.2 |
The easiest way would be to convert to an
Maybe...it could also just be that it's a placeholder for now (although maybe the error message should reflect that: "must cast geoarrow_vctr column to native vector type before subsetting" or something). Notably, it should "just work" when somebody calls |
Or, as you noted, which would totally work, you could walk the indices searching for the minimum number of slices that can represent the new indices, which is potentially a very large number of slices. That will probably be slow and I would maybe rather error until it can be fast! |
I agree, walking the indices and making slices is very complicated and potentially more costly than a copy. An optional dependency on arrow is probably safe. You could create a geoarrow_vctr without having arrow installed, but you'd have to start from something in R first to do that, I think? Seems unlikely to me. Somewhat related and a bit of a left-field idea, is there value in a wk_handle_indices(handleable, geoarrow_writer(), indices) |>
as_geoarrow_vctr() |
Possibly! The previous incarnation of geoarrow did something like that: https://github.com/paleolimbot/geoarrow/blob/master/src/geoarrow-handle-wk.cpp#L290-L315 . I am not sure in retrospect that it was a good idea for the specific use case of the vctr...it did allow for randomly sorting a vctr, but it also kept possibly large quantities of data from being freed until the "take" operation was resolved somehow. I think R users are more likely to be able to reason about an eager "take". |
I had imagined My initial thinking was to implement this via a buffered filter, which holds a copy of each geometry in indices until we've read enough to pass onto the next handler. We only need to buffer while there's geometries we haven't yet seen which must be passed before what we have read (e.g. c(5, 1)). Seems to be a similar idea to this, just with wk flavour. Worst case (reverse sort) this is 2 copies of the vector, so quite inefficient. |
Add minimal geoarrow_vctr class for geoarrow vectors. Enables integration with other packages
wk
,sf
,geos
etc.See #69 (comment)
The text was updated successfully, but these errors were encountered: