-
Notifications
You must be signed in to change notification settings - Fork 8
RDF SPARQL On The Fly
Raoul J.P. Bonnal edited this page May 20, 2014
·
10 revisions
Exposing raw data or NoSQL databases has advantages. Can speed up testing and lowering barrier to RDF, at least for non ontology gurus. Using a Sinatra web application is possible to expose quickly NoSQL database like ElasticSearch as SPARQL 1.1 end point. Following Jerven's advices, rdf repository howto and using Ruby-rdf
Logic
@@client = Elasticsearch::Client.new
def sparql_logic(query)
triplette = []
options = {:ref_db => :ensembl, :ref_db_version => 75, :species => 'homo_sapiens'}
local_gene = "http://genome.db/#{options[:ref_db]}/#{options[:ref_db_version]}/#{options[:species]}/"
%w(INGMG_ CABG_ ENSG).each do |prefix_gene_id|
query.scan(/#{prefix_gene_id}[0-9]+/).uniq.each do |gene|
data = @@client.search(q: gene, size: 100)
data["hits"]["hits"].each do |hits|
hit = hits["_source"]
if hit.key?('file_type')
hit["tags"].each do |tag|
if hit.key?(tag)
uri = RDF::URI("#{local_gene}#{hit['parent']}/#{tag}/#{gene}")
triplette << [ uri, RDF::URI("efo:EFO_0000001"), tag ]
triplette << [ uri, RDF::URI("http://genome.db/analysis/has_gene_id"), gene ]
triplette << [ uri, RDF::URI("http://genome.db/analysis/is_a"), hit['file_type'] ]
triplette << [ uri, RDF::URI("http://genome.db/analysis/has_fpkm"), hit[tag] ]
# triplette << [ RDF::URI("#{local_gene}#{hit['gene_id']}"), RDF::URI("http://genome.db/analysis/differentially_expressed_in"), tag ]
# triplette << [ RDF::URI("#{local_gene}#{hit['gene_id']}"), RDF::URI("http://genome.db/analysis/has_differential_value"), hit[tag] ]
end
end
else
uri = RDF::URI("#{local_gene}#{hit['parent']}/#{hit['gene_id']}")
triplette << [ uri, RDF::URI("http://genome.db/analysis/has_gene_id"), hit["gene_id"] ]
triplette << [ uri, RDF::URI("http://genome.db/analysis/has_fpkm"), hit['FPKM'] ] if hit.key?('FPKM')
triplette << [ uri, RDF::URI("http://genome.db/analysis/has_fpkm_conf_lo"), hit['FPKM_conf_lo'] ] if hit.key?('FPKM_conf_lo')
triplette << [ uri, RDF::URI("http://genome.db/analysis/has_fpkm_conf_hi"), hit['FPKM_conf_hi'] ] if hit.key?('FPKM_conf_hi')
triplette << [ uri, RDF::URI("http://genome.db/analysis/has_fpkm_status"), hit['FPKM_status'] ] if hit.key?('FPKM_status')
end
end
end
end
repository = RDF::Graph.new
triplette.each do |tripletta|
repository << tripletta
end
SPARQL.execute(query, repository)
end
Sinatra request
post "/query" do
if params["query"]
query = params["query"].to_s.match(/^http:/) ? RDF::Util::File.open_file(params["query"]) : ::URI.decode(params["query"].to_s)
sparql_logic(query)
else
settings.sparql_options.merge!(:prefixes => {
:ssd => "http://www.w3.org/ns/sparql-service-description#",
:void => "http://rdfs.org/ns/void#"
})
service_description(:repo => repository)
end
end
I will create a new biogem demo app that will include a Rail Engine to plug on a Rails web app a SPARQL endpoint. The SPARQL end point is by definition limited to the domain of it's implementation and can not be considered a central repository.
ToDo:
- Possible integration with BioInterchange ?
- Support different NoSQL database (ElasticSearch, CouchDB, K/Values)
- Support raw files maybe RNA-Seq, other biological data file that could be queried but is convenient to keep in their original format.
- Possible integration with BaseSpace/Illumina ?
- Can Ruby's lambda functions used to stress the fly concept for RDF generation ?
- ...