SPARQL-proxy is a portable Web application that works as a proxy server for any SPARQL endpoint providing the following functionalities:
- validation of the safety of query statements (omit SPARQL Update queries)
- job scheduling for a large number of simultaneous SPARQL queries
- providing a job management interface for time consuming SPARQL queries
- (optional) cache mechanisms with compression for SPARQL results to improve response time
- (optional) logging SPARQL queries and results
- (experimental) splitting a SPARQL query into chunks by adding OFFSET & LIMIT
$ docker run -p 8080:3000 -e SPARQL_BACKEND=http://example.com/sparql ghcr.io/dbcls/sparql-proxy
- Node.js 20.11.1 LTS or later
$ git clone [email protected]:dbcls/sparql-proxy.git
$ cd sparql-proxy
$ npm install
$ npm run build
(Be patient, npm install
and npm run build
may take a few minutes)
Then, start SPARQL-proxy:
$ PORT=3000 SPARQL_BACKEND=http://example.com/sparql ADMIN_USER=admin ADMIN_PASSWORD=password npm start
Open http://localhost:3000/
on your browser.
Dashboard for administrators is at http://localhost:3000/admin
.
If you want to deploy SPARQL-proxy under a subdirectory (say, /foo/
), pass the directory via ROOT_PATH
to both npm build
and npm start
:
$ ROOT_PATH=/foo/ npm run build
$ ROOT_PATH=/foo/ PORT=3000 SPARQL_BACKEND=http://example.com/sparql ADMIN_USER=admin ADMIN_PASSWORD=password npm start
(Note that ROOT_PATH
must end with /
.)
Set up your reverse proxy to direct requests to the SPARQL-proxy. If you're using Nginx, configure it as follows:
server {
location /foo/ {
proxy_pass http://localhost:3000/foo/;
}
}
If you want to serve /foo/sparql
as /sparql
, configure it as follows:
server {
location /foo/ {
proxy_pass http://localhost:3000/foo/;
}
location /sparql {
proxy_pass http://localhost:3000/foo/sparql;
}
}
Most configurations are done with environment variables:
(default: 3000
)
Port to listen on.
URL of the SPARQL backend.
(default: /
)
If you want to deploy sparql-proxy under a subdirectory (say, /foo/
), configure ROOT_PATH
to point the path.
(default: admin
)
User name for the sparql-proxy administrator.
(default: password
)
Password for the sparql-proxy administrator.
(default: null
)
Cache store. Specify one of the followings:
null
: disable caching mechanism.file
: cache in the local files.memory
: cache in the proxy process.redis
: use redis.memcache
: use memcached.
(default: raw
)
Cache compression algorithm. Specify one of the followings:
raw
: disable compression.snappy
: use snappy.
(only applicable to CACHE_STORE=file
case)
(default: /tmp/sparql-proxy/cache
)
Root directory of the cache store.
(only applicable to CACHE_STORE=memory
case)
Maximum number of the entries to keep in the cache.
(only applicable to CACHE_STORE=redis
case)
(default: localhost:6379
)
Specify URL to the redis server.
(only applicable to CACHE_STORE=memcache
case)
(default: localhost:11211
)
Specify server locations to the memcache servers (comma-separated).
(default: 300000
)
Job timeout in millisecond.
(default: 300000
)
Duration in millisecond to keep old jobs in the administrator dashboard.
(default: 1
)
Number of the concurrent requests.
(default: Infinity
)
Number of the jobs possible to be waiting.
(default: false
)
Set true
to trust proxies in front of the server.
(default: 10000
)
Cap the LIMIT of queries.
THIS IS AN EXPERIMENTAL FEATURE.
(default: false
)
Set true
to enable query splitting. If enabled, content negotiation will be disabled; sparql-proxy will always use application/sparql-results+json
. That is because merging results other than JSON is not supported.
(only applicable to ENABLE_QUERY_SPLITTING=true
case)
(default: 1000
)
Split queries into the chunk size specified.
(default: null)
Log queries (and the corresponding responses) to the file, if specified.
THIS IS AN EXPERIMENTAL FEATURE.
(default: false
)
Set true
to enable passthrough mode. If enabled, queries are sent to the backend as is, as far as possible. All of the query validations are bypassed; so the destructive queries can reach the backend.
You should enable this feature only when you understand exactly what you are doing.
If you want to serve SPARQL service description, put the descriptions under files
directory with the name description.[format]
.
Use files/description.ttl
for text/turtle
and files/description.rdf
for application/rdf+xml
.
NOTE: If you're running sparql-proxy
within Docker, you may want to use -v
option for docker
command to make the files accessible from the inside of the container:
$ docker run -p 8080:3000 -e SPARQL_BACKEND=http://example.com/sparql -v `pwd`/files:/app/files dbcls/sparql-proxy
THIS IS AN EXPERIMENTAL FEATURE.
SPARQL-proxy has a plugin system to extend its functionalities. In order to activate this feature, create plugins.conf
file in files
directory. The paths of the plugins to be used should be listed in this file, in the order you want to apply them. See more details in PLUGINS.md.
This feature does not work with the query splitting mode and the passthrough mode.
sparql-proxy relays HTTP headers starting with X-SPARQL-
received from backends. This is intended to pass through X-SPARQL-MaxRows
, which is emitted from Virtuoso.
Be careful, if you set MAX_LIMIT
to a value smaller than ResultSetMaxRows
configured in Virtuoso, sparql-proxy will issue a query such that Virtuoso will not return X-SPARQL-MaxRows
. As a result, X-SPARQL-MaxRows
will not be returned to the client.
If you enabled the query splitting mode, this feature is disabled. This is because a single request to sparql-proxy is split into multiple requests to the backend, and it is not uniquely determined which response header should be returned.