-
Notifications
You must be signed in to change notification settings - Fork 113
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added JSON Hot-Swapping Utility, Enabled for JSON-LD and RDFa #69
Conversation
…tested!) Relevant to #37 (hey @codinguncut ;) ) Closes: #49
Codecov Report
@@ Coverage Diff @@
## master #69 +/- ##
==========================================
+ Coverage 88.27% 88.34% +0.06%
==========================================
Files 8 9 +1
Lines 307 326 +19
Branches 53 55 +2
==========================================
+ Hits 271 288 +17
- Misses 33 34 +1
- Partials 3 4 +1
Continue to review full report at Codecov.
|
Do you intend to perform performance measurements in this PR? This is the primary motivation for the change, right? |
Hey @lopuhin - right now, there should be no significant change, as it's still using the same decoder. It's just abstracted to permit people to use However, I do intend to then profile whether general uses of |
Hey! Why is swapping required? If ujson is faster, then we can always use it when it can be imported, and fall back to stdlib if it can't. If ujson is not faster, then we can just leave it as-is. To check how json-ld performs I think it makes sense to get a random sample from http://webdatacommons.org/structureddata/2017-12/stats/how_to_get_the_data.html (say, 1000 or 10000 records, 1 record per domain), then benchmark. This way we can also validate that there are no exceptions or segfaults. |
Hey @kmike - Key grievances include:
So, it's possible for callers to use |
@cathalgarvey thanks for the pointers! It makes sense. |
OK: I've added benchmarks comparing The results show that
(benchmarking provided by I'd push the commits directly, but let me first ask: I've compressed the 79 new html samples with Gzip. Is it OK to add those to the git tree? That's a lot of binary, though I don't think we'll ever be editing it so it's probably a one-time bump. I could move them into a new repository as a submodule, if we wanted to get fancy, or I could use The results above are on my own desktop and benchmarks like this may be prone to all sorts of externalisms, so I'd love to get some validation from others. |
Just to clarify by the way, I only added benches for JSON-LD, but the other place JSON decoding happens in The decoding of these RDF-a results might be a more homogenous task, so differences between libraries might be more obvious or less relevant. I don't have benchmarks for this yet, and ideally we'd find a way to not use JSON as an intermediary between the upstream library and |
Hi 😄 , I just want to bump this PR, it will be merged?, what else is necessary to achieve merge?. I want to help here if it's needed 💪 . |
I feel that this is over-engineering the issue a bit. Why not give ability to return raw json and let user to convert to dict or whatever however they want? I've submitted a PR that does exactly that - just return raw json rather than a dict: |
This is intended as a way to allow a library consumer to specify their own JSON decoder function to use in
extruct
.The idea is that some json decoders, like
ujson
, can offer large gains in performance in many or most cases.. but they're not reliable enough or well-supported enough to actually switchextruct
over to.So, a library consumer can now do:
...and
extruct
should commence usingujson
instead ofjson
. The second argument is the exceptions that the library raises on bad JSON input; the new hot-swapper will cast non-native exceptions to their native equivalents (ValueError
in Python 2,json.JSONDecodeError
in Python 3).Speaking of which, this PR also contains a small fix for a subtle 2/3 bug in
extruct
where onlyValueError
s were getting caught in one code-block.The next step for this is to profile whether
ujson
et al actually add any performance gains; if they did, then we could add them as optional features enabled viapip
or similar.Fixes #49