-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Storing Documents: Non JSON-LD Content dropped #38
Comments
Here we need to discuss, this may lead to problems in Databus, when someone would post invalid document this would still be saved to git then, not cool, so we need to develop rules or maybe some additional param (like |
Not a problem for Databus, all inputs are validated on the Databus side first before saving |
Invalid documents should be rejected (bad JSON syntax) but all JSON is somewhat JSON-LD - even if it's just an empty graph. A long JSON document might still contain 2-3 triples as JSON-LD, we can't lose the entire rest of the document though. Accept only if parsable of course, but Jena will usually just ignore anything that isn't LD in any JSON. Triple store then holds all triples, git has the full docs with unmodified content. This is a hard requirement for the Gstore to serve as a database for MOSS and the OEF, not all of their JSON content is JSON-LD. Also all documents need to be saved "as is" - so after the validation, the JSON document should be saved as it came and not the JSON-LD printed out of the model. |
Ahhh so you want to store only jsons? I thought just any kind of file... Yes there is also some postprocessing of the jsonlds, they are stored in minimised format and not containing the full context also, but just URI for it. |
So that still will be json/jsonlds... I see now... this does not seem to be a problem |
So this would look like that: if invalid json (and therefor the whole document is invalid) -> we reject the whole This is how you want? @holycrab13 |
in case you integrate this please really make it configurable/switchable and non-default. g-store is supposed to be a graph store with a simple git history of the graphs in git - not a json store^^. @manonthegithub i think they ask to you store the json in git as is - so not stripping non-ld content and not normalizing it. in case you accept json file that does not contain any ld -> so leads to no triple it is seems also interesting that there is no api function to get all files or at least the history of a file, but I guess we never needed it so far^^. |
@JJ-Author to see all files you can go to file browser which is included in gstore (it is at /file path) |
Yes, that's it. @JJ-Author either /file, but the /g path should also just return the document imo. This would probably require to handle JSONLD differently than any other RDF syntax |
Hmmmm.... I had some more thought over it... It seems like having this feature is a dirty hacky solution for some particular problem which does not really fit into concept of gstore, but just in current case it is easier to implement like this. |
Why would you in the first place mix both formats is also not clear to me? Looks like a design issue. |
we could potentially implement a method which will allow to store non-RDF content in gstore too (just saving in git), but separately. |
it can actually be the same method for save and read, just checking if its an rdf content or not and if yes then parse rdf and put to virtuoso, if not then just save to git and voila |
We have the case where the document is 20% RDF and the rest just JSON, so a hard separation won't work in this case. I think it would be best to save to git and then on the virtuoso side create a graph for the document and throw in whatever is parsable RDF. A non RDF document will just end up with an empty graph. |
While it's not great, it's still somewhat valid and all JSON-LD parsers can deal with it. I think there no real reason not to support it |
I don't see a point why the separation won't work. Doesn't matter what % is which part, you don't do it manually.
It is valid in a sense you get from gstore. If you claim the format to be jsonld then only json-ld is saved. If you claim format to be json then you save json, but it is not parsed as json ld. It is just a coincidence that json-ld is also a valid json. One more thing then I could recommend if they still want to do it is to have a field containing the whole json ld-document and other fields. Then you could easily take the json-ld part and save it to gstore and for json we can create a new method which allows to save non-rdf data. like that: {
bla: "bla",
bla2: "bla2",
jsonld: <here is full json-ld object>
} In general, I just think we can find a better solution than mixing the formats. I am quite certain if we do it now we will get some issues in the future. Better just to support some new formats... If abovemnetioned not possible. One of the possible workaround solutions to that could be to make up a new custom media type for that and require to specify it. something like |
" It is just a coincidence that json-ld is also a valid json." not a coincidence, this is in the definition. Mixing is done often, doing all JSON-LD is not viable for the client, since there's a LOT of json fields. This is a hard requirement that we need for MOSS and DLR |
i think persisting the json makes especially sense when you have an external json-ld context that might change over time. |
https://gstore-playground.tools.dbpedia.org/file/ does show an error. how do you list all files? |
I am still against converting gstore to docuement store, but here are proposed changes: Key Changes
|
Option 2 for now seems better, we can switch to Option 1 if very much needed in future without much effort. |
Here is a slightly different phrased version of Option 2. would look like this: <style type="text/css"></style>
|
it is repo + path + prefix, in previous message there are only changes, so repo and prefix they remain Graph mode just won't be there, only doc mode. Graph mode is the current version. Only uri won't work, because not pear then what part is prefix and repo. The uri can contain arbitrary many segments in the prefix part, path + prefix + repo is the best option |
Ok, right, so the prefix, repo part is answered.
|
No it is different systems, different approaches to work with data, we do either one or the other, not both.
Yes, it must be there, no other way to understand what kinda data it is (when reading by
The content-type will be ignored, only Accept head for |
Hm, really? the underlying functionality is the same. GIT and Virtuoso just accept data. I would implement it with two different servlet/scalatra implementations and different web.xml and swagger. Depending on which one you start you get pure graph or doc mode. My question was how difficult it was code wise and I think, we should only do the doc mode for now, but implement it in a way that we can add a different servlet implementation later.
Ok, so graph/read would need this as internal input to select the parser.
I meant on POST and doc/save this is where content-type is/should be sent by the client. Posting "Content-type: text/turtle" to ?path=file.jsonld will throw an error then? |
the answer is the same as before. It will be ignored, I understood what you meant.
This is just super weird running different services in the same container, I won't do that. If you want to keep old gstore, we need just to fork repo, or make a special branch, that is it.
we should have one single source of truth/data, not many, and so far it was git, not virtuoso, that is why we parse the document, and not query from virtuoso |
one problem which may occur in the future. When we get the same media type/extension, like |
Please answer with enough detail. It still sounds like you will implement a connection reset/connection time out. But I am asking about HTTP status code and what causes it, e.g. ".jsonld" in URI will trigger the use of JSON-LD parser, if body doesn't parse then 400 Bad Request" is that it?
we can make two docker containers. I totally don't care if this would be in different branches.
The main purpose of virtuoso is to query the graph data, it is hard for me to really think of the docs being the only way, we are allowed to use to get graph data. also a) it should be consistent, not eventually consistent, b) editing is on the doc, so there SSoT is not violated. even before in pure graph mode there were two synchronized SSoT which was the idea behind GSTORE. |
file endings are our convention any how as there are no standard file endings, just media-types. so doing a list "file-ending"-> "parser" on each deployment would be enough. |
Formulate then the question with enough detail, what exactly you want to know (e.g. status codes etc), mention everything. A am not reading your mind. Really annoying. I don't know how is that not clear, just don't understand. I get really annoyed, as it looks like trolling. Content-Type is ignored means it is not checked or used anywhere in the code. how is that not clear? Please think a little bit with your own head before asking, or ask chat got to give explanations of my responses.
It should be clear that I mean servlet container, not docker containers. Again please formulate things you want precisely. You did not want to ask is that is much work, but you actually want it now. So you just would like to keep both of them. Then just mention that explicitly. Here we can also tag last commit so far in gstore and that's it for now.
every media type has standard file ending, some of them have several |
@kurzum I thought about this option a bit more, and this also actually makes sense, if we keep two frontends but same shared code base in deeper logic, this also works. Now I don't know what solution is actually better, forking/tagging or keeping both together... Both are actually valid. I will the main part first and then we may decide to have the second service as well as extra feature... |
#41 |
Should work now, can be tested. @holycrab13 @JonathanJustavino |
Currently, when saving a document, content that cannot be parsed to RDF in jena is dropped.
(e.g. JSON content that is not JSON-LD is dropped, before storing the file)
It would be nice, if the git part of the gstore stores the original document,
and any converted rdf content from the document is stored in the triple store.
The text was updated successfully, but these errors were encountered: