Name		Name	Last commit message	Last commit date
parent directory ..
__init__.py		__init__.py
index.py		index.py
query.py		query.py
readme.md		readme.md
test_queries.py		test_queries.py

readme.md

Index/query scripts for IntEnz enzyme dataset

index.py: Index IntEnz xml files, tested with IntEnz December 2019 release

  $ ./nosqlbiosets/intenz/index.py --help
  usage: index.py [-h] [-infile INFILE] [--index INDEX] [--doctype DOCTYPE]
                  [--host HOST] [--port PORT] [--db DB]
  
  Index IntEnz xml files, with Elasticsearch, MongoDB or Neo4j
  
  optional arguments:
    -h, --help            show this help message and exit
    -infile INFILE, --infile INFILE
                          Input file name (intenz/ASCII/intenz.xml)
    --index INDEX         Name of the Elasticsearch index or MongoDB database
    --doctype DOCTYPE     Document type name for Elasticsearch, collection name
                          for MongoDB
    --host HOST           Elasticsearch, MongoDB or Neo4j server hostname
    --port PORT           Elasticsearch, MongoDB or Neo4j server port
    --db DB               Database: 'Elasticsearch', 'MongoDB' or 'Neo4j'

query.py: Query API (naive and not comprehensive), more queries with MongoDB, few with Neo4j

  $ ./nosqlbiosets/intenz/query.py --help
  usage: query.py [-h] [--limit LIMIT] qc outfile
  
  Save IntEnz reaction connections as graph files
  
  positional arguments:
    qc             MongoDB query clause to select subsets of IntEnz entries,
                   e.g.: '{"reactions.label.value": "Chemically balanced"}'
    outfile        File name for saving the output graph. Format is selected
                   based on the file extension of the given output file; .xml
                   for GraphML, .gml for GML, .json for Cytoscape.js, or
                   .d3js.json for d3js format
  
  optional arguments:
    -h, --help     show this help message and exit
    --limit LIMIT  Maximum number of enzyme-metabolite connections

./nosqlbiosets/intenz/query.py '{"reactions.label.value": "Chemically balanced"}'\
  balanced-reactions.xml --limit 800

./nosqlbiosets/intenz/query.py '{"cofactors.#text": "Pyrroloquinoline quinone"}'\
  cofactors.json

./nosqlbiosets/intenz/query.py '{"$text": {"$search": "poly(A)"}}' polyA.json

tests.py: Tests with the query API

Example graph

Example command lines for indexing

Server default connection settings are read from ../../conf/dbservers.json

# Download IntEnz xml files

wget -nc -P ./data http://ftp.ebi.ac.uk/pub/databases/intenz/xml/ASCII/intenz.xml

# Index with Elasticsearch, requires ~ 5m to 15m
./nosqlbiosets/intenz/index.py --db Elasticsearch --infile ./data/intenz.xml\
 --index intenz

# Index with MongoDB, requires ~1m with local server, ~12m with MongoDB Atlas
./nosqlbiosets/intenz/index.py --db MongoDB --infile ./data/intenz.xml

# Index with Neo4j (processing time ~ 12m)
./nosqlbiosets/intenz/index.py --db Neo4j --infile ./data/intenz.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

intenz

intenz

readme.md

Index/query scripts for IntEnz enzyme dataset

Example graph

Example command lines for indexing

Files

intenz

Directory actions

More options

Directory actions

More options

Latest commit

History

intenz

Folders and files

parent directory

readme.md

Index/query scripts for IntEnz enzyme dataset

Example graph

Example command lines for indexing