Skip to content

RDF SPARQL On The Fly

Raoul J.P. Bonnal edited this page May 20, 2014 · 10 revisions

Exposing raw data or NoSQL databases has advantages. Can speed up testing and lowering barrier to RDF, at least for non ontology gurus. Using a Sinatra web application is possible to expose quickly NoSQL database like ElasticSearch as SPARQL 1.1 end point. Following Jerven's advices, rdf repository howto and using Ruby-rdf

Logic

@@client = Elasticsearch::Client.new
def sparql_logic(query)
      triplette = []
    options = {:ref_db => :ensembl, :ref_db_version => 75, :species => 'homo_sapiens'}
    local_gene = "http://genome.db/#{options[:ref_db]}/#{options[:ref_db_version]}/#{options[:species]}/"
    %w(INGMG_ CABG_ ENSG).each do |prefix_gene_id|
      query.scan(/#{prefix_gene_id}[0-9]+/).uniq.each do |gene|
        data = @@client.search(q: gene, size: 100)
        data["hits"]["hits"].each do |hits|
          hit = hits["_source"]

          if hit.key?('file_type')
            hit["tags"].each do |tag|
              if hit.key?(tag)
                uri = RDF::URI("#{local_gene}#{hit['parent']}/#{tag}/#{gene}")
                triplette << [ uri, RDF::URI("efo:EFO_0000001"), tag ]
                triplette << [ uri, RDF::URI("http://genome.db/analysis/has_gene_id"), gene ]
                triplette << [ uri, RDF::URI("http://genome.db/analysis/is_a"), hit['file_type'] ]
                triplette << [ uri, RDF::URI("http://genome.db/analysis/has_fpkm"), hit[tag] ]
                # triplette << [ RDF::URI("#{local_gene}#{hit['gene_id']}"), RDF::URI("http://genome.db/analysis/differentially_expressed_in"), tag ]
                # triplette << [ RDF::URI("#{local_gene}#{hit['gene_id']}"), RDF::URI("http://genome.db/analysis/has_differential_value"), hit[tag] ]
              end
            end            
          else
            uri = RDF::URI("#{local_gene}#{hit['parent']}/#{hit['gene_id']}")
            triplette << [ uri, RDF::URI("http://genome.db/analysis/has_gene_id"), hit["gene_id"] ]
            triplette << [ uri, RDF::URI("http://genome.db/analysis/has_fpkm"), hit['FPKM'] ] if hit.key?('FPKM')
            triplette << [ uri, RDF::URI("http://genome.db/analysis/has_fpkm_conf_lo"), hit['FPKM_conf_lo'] ] if hit.key?('FPKM_conf_lo')
            triplette << [ uri, RDF::URI("http://genome.db/analysis/has_fpkm_conf_hi"), hit['FPKM_conf_hi'] ] if hit.key?('FPKM_conf_hi')
            triplette << [ uri, RDF::URI("http://genome.db/analysis/has_fpkm_status"), hit['FPKM_status'] ] if hit.key?('FPKM_status')
          end
        end
      end
    end
    repository = RDF::Graph.new
    triplette.each do |tripletta|
      repository << tripletta
    end
    SPARQL.execute(query, repository)
end

Sinatra request

   post "/query" do 
      if params["query"]
        query = params["query"].to_s.match(/^http:/) ? RDF::Util::File.open_file(params["query"]) : ::URI.decode(params["query"].to_s)
    sparql_logic(query)
   else
     settings.sparql_options.merge!(:prefixes => {
      :ssd => "http://www.w3.org/ns/sparql-service-description#",
       :void => "http://rdfs.org/ns/void#"
    })
     service_description(:repo => repository)
  end
end

I will create a new biogem demo app that will include a Rail Engine to plug on a Rails web app a SPARQL endpoint. The SPARQL end point is by definition limited to the domain of it's implementation and can not be considered a central repository.

ToDo:

  • Possible integration with BioInterchange ?
  • Support different NoSQL database (ElasticSearch, CouchDB, K/Values)
  • Support raw files maybe RNA-Seq, other biological data file that could be queried but is convenient to keep in their original format.
  • Possible integration with BaseSpace/Illumina ?
  • Can Ruby's lambda functions used to stress the fly concept for RDF generation ?
  • ...
Clone this wiki locally