Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question: in-C access to de-jsonification? #444

Closed
r2evans opened this issue Oct 9, 2024 · 7 comments
Closed

Question: in-C access to de-jsonification? #444

r2evans opened this issue Oct 9, 2024 · 7 comments

Comments

@r2evans
Copy link

r2evans commented Oct 9, 2024

Is there any ability to do de-jsonification on something in memory that is not an R string?

I'm thinking on this point: richfitz/redux#60

A partner-company I work with is going to be using Redis as a large-scale cache of big-ish data. I'd rather not pay the string-pool-price of pulling the json text into R-space and calling fromJSON. Instead, I'd like to be able to do deserialization (JSON-style) in C-space. Is there a way to work on a raw stream? (I recognize it currently accepts a "file", I'm hoping to not require intermediate temp-files.)

@jeroen
Copy link
Owner

jeroen commented Oct 9, 2024

fromJSON() works on connection objects so they are streamed by definition. There is also stream_in to parse jsonld data. But I don't think that it matters much for performance. Are you running into any problem that suggests so?

@r2evans
Copy link
Author

r2evans commented Oct 10, 2024

The problems I run into is pulling very large json-structured strings into R and only then converting to an R-native object: the memory consumed grows quickly. Thanks for the reminder about connections, I'll try with rawConnection to see if that works (once I figure out how to get GET(..) to return a raw object when it is json-text).

@jeroen
Copy link
Owner

jeroen commented Oct 10, 2024

To read from http you can use a base::url() or curl::curl() connection:

df <- jsonlite::fromJSON(url('https://tidyverse.r-universe.dev/ggplot2/data/diamonds/json'))

@r2evans
Copy link
Author

r2evans commented Oct 10, 2024

I'm not reading from HTTP, I'm reading an already-in-memory blob, retrieved from redis. One challenge is that often objects are encoded in JSON in redis. The jsonlite::fromJSON is very good at avoiding string penalties when reading from files and http hits.

Bottom line, I think my question here is resolved, your reminder about connections pointed me to rawConnections. For instance, this works:

obj <- charToRaw(jsonlite::toJSON(mtcars))
head(obj)
# [1] 5b 7b 22 6d 70 67
head(jsonlite::fromJSON(rawConnection(obj)), 3)
#                mpg cyl disp  hp drat    wt  qsec vs am gear carb
# Mazda RX4     21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
# Mazda RX4 Wag 21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
# Datsun 710    22.8   4  108  93 3.85 2.320 18.61  1  1    4    1

Thanks for the discussion!

@r2evans r2evans closed this as completed Oct 10, 2024
@r2evans
Copy link
Author

r2evans commented Oct 10, 2024

FYI, I found that forcing raw objects from redis and then rawConnection to fromJSON does not perform any differently than passing strings. I haven't dived into this enough to see where the memory consumption is happening, but if you're curious, see richfitz/redux#60 (comment).
Thank you again, Jeroen! (I met you once, btw, at an R conference right after you defended your dissertation.)

@jeroen
Copy link
Owner

jeroen commented Oct 10, 2024

FYI, I found that forcing raw objects from redis and then rawConnection to fromJSON does not perform any differently than passing strings.

That is what I would have guessed. 99% of the work is done not by the json parsing but subsequently in R allocating all these nested lists and vectors to store the data as an R object.

hank you again, Jeroen! (I met you once, btw, at an R conference right after you defended your dissertation.)

That's over 10 years ago now, I feel very old 👴

@r2evans
Copy link
Author

r2evans commented Oct 10, 2024

I think you were younger than me back then, I doubt you've passed me ...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants