Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

requesting an annotation via HTTP yields broken string escaping #7

Open
ben-tinc opened this issue Feb 1, 2024 · 1 comment
Open

Comments

@ben-tinc
Copy link

ben-tinc commented Feb 1, 2024

Current behavior

Consider a string as the following:

"{\"text\":\"Hello \\\"world\\\"!\"}"

It is a valid string serialization of a javascript object and as such usable with javascript's JSON.parse(). Specifically, it is a serialization of the object

{ text: 'Hello "world"!' }

As a regular string, the serialization is also a valid value for a TextualBody of a web annotation. Using the wap-servers webapp, we can easily create the following annotation:

{
  "@context": "http://www.w3.org/ns/anno.jsonld",
  "type": "Annotation",
  "body":  {
    "type": "TextualBody",
    "value": "{\"text\":\"Hello \\\"world\\\"!\"}",
    "purpose": "tagging"
  },
  "target": "http://example.com/page1"
}

Let us assume the resulting annotation has the URI "http://localhost:8889/wap/TestContainer/a218953d-192b-4074-96d2-be3f33d07ec2". Accessing it in the browser (and choosing "raw" output) yields the following:

{
  "@context" : "http://www.w3.org/ns/anno.jsonld",
  "id" : "http://localhost:8889/wap/TestContainer/a218953d-192b-4074-96d2-be3f33d07ec2",
  "type" : "Annotation",
  "created" : "2024-02-01T11:07:27Z",
  "modified" : "2024-02-01T11:07:27Z",
  "body" : {
    "type" : "TextualBody",
    "value" : "{\"text\":\"Hello \"world\"!\"}",
    "purpose" : "tagging"
  },
  "target" : "http://example.com/page1"
}

As we can see, the value of the TextualBody has changed. Every instance of " is now only escaped once. Incidentally, this also means that the string is no longer a valid JSON serialization.

Let's instead use SPARQL to query the same annotation:

PREFIX oa: <http://www.w3.org/ns/oa#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>

SELECT ?value {GRAPH ?g {
        <http://localhost:8889/wap/TestContainer/a218953d-192b-4074-96d2-be3f33d07ec2> a oa:Annotation.
        <http://localhost:8889/wap/TestContainer/a218953d-192b-4074-96d2-be3f33d07ec2> oa:hasBody ?b1 .
        ?b1 oa:hasPurpose oa:tagging .
        ?b1 rdf:value ?value.      
  }
}

Store the above query as query.sparql and use e.g.

curl -X POST "http://localhost:3330/wap/sparql" -H "Content-Type: application/sparql-query" -H "Accept:application/sparql-results+json"  -d "@query.sparql"

The result is

{
  "head": {
    "vars": [ "value" ]
  } ,
  "results": {
    "bindings": [
      {
        "value": { "type": "literal" , "value": "{\"text\":\"Hello \\\"world\\\"!\"}" }
      }
    ]
  }
}

So we can see that the correct, unchanged string is still available in the triple store. However, accessing it via HTTP modifies the string, breaking the escaping in the process.

Similarly, string escaping is broken when an annotation contains escaped newlines. I would imagine that every kind of escaping is liable to be affected, but " and \n are the ones we are encountering in practice.

Expected behavior

String values should get retrieved without modification.

Thanks for your consideration. Please let me know if you need more details.

GGoetzelmann added a commit to GGoetzelmann/wap-server that referenced this issue Feb 14, 2024
Formatting now utilizes json-ld serialization from jena directly instead of converting to nquads first.
Addressing kit-data-manager#7
@ben-tinc
Copy link
Author

I can confirm that GGoetzelmann@a5be1c7 fixes all the issues we are seeing with escaping.

Thanks a lot! :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant