-
Notifications
You must be signed in to change notification settings - Fork 7
SQLAlchemy
SQLAlchemy makes link between python classes and related database tables. It loads data from DB to classes and when objects are changed, can commit changes back to DB. Also SQLAlchemy glues the classes and makes it possible to navigate between objects.
Lets see a code example:
# open database
db = rcdb.RCDBProvider("sqlite:///example.db")
# get Run object for the run number 1
run = db.get_run(1)
# now we have access to all conditions for that run as
run.conditions
# get all condition names or all condition values
names = [condition.name for condition in run.conditions]
values = [condition.values for condition in run.conditions]
SQLAlchemy makes queries to database if needed. So when you do run = self.db.get_run(1)
, Run.conditions
collection is not yet loaded from DB. It actually isn't loaded even when we do like x=run.conditions. But first time
when a real value is needed, database is queried for all conditions for that run.
Even if overriding of existing values are possible for RCDB, deleting data or editing existing condition types considered to be avoided. But sometimes it is needed. Especially at the development/debugging phase.
To edit or delete things SQLAlchemy '''session''' object can be used.
Edit condition type
# get condition type
condition_type = db.get_condition_type("my_var")
# Change what you need
condition_type.value_type = ConditionType.JSON_FIELD
# Calling session commit will save changes to database
db.session.commit()
Rename condition
# get condition type
condition_type = db.get_condition_type("my_var")
# Change what you need
condition_type.name = "new_var"
# Calling session commit will save changes to database
db.session.commit()
The magic is that all data for all runs are now accessible by '''new_var'''
Deleting objects is done with session.delete function:
# Edit condition type
condition_type = db.get_condition_type("my_var")
# mark the object for deletion
db.session.delete(condition_type)
# Calling session commit will save changes to database
db.session.commit()
More about session and SQLAlchemy objects manipulation with it can be found in [sql alchemy docs](http://docs.sqlalchemy.org/en/rel_0_9/orm/session_basics.html#basics-of-using-a-session SQLAlchemy documentation)
If you ever want to get Run object by run_number here is how:
run = db.get_run(run_number)
print run.number
print run.start_time
print run.end_time
print run.conditions... # but it is written further
How to query runs is shown far below
Lets select all runs with run_number < 100 using SQLAlchemy
# open database
db = rcdb.RCDBProvider("sqlite:///example.db")
# create query
query = db.session.query(Run).filter(Run.number < 100)
# get count of selected runs
print query.count()
# get first run from selected
print query.first()
# get all run that matches the creteria
print query.all()
What happened?
'''db.session''' - gets SQLAlchemy ''session'' object
'''.query(Run)''' - here we say, that we want Run objects to be returned. At the same time we say what table we want to query
'''.filter(Run.number < 100)''' - filtering clause
When we've got query ready, we can actually get objects by query.first()
or query.all()
(there are actually more) or just count number of runs by query.count()
We can use Run.conditions to get conditions for each run. Lets see more advanced example
db = rcdb.RCDBProvider("sqlite:///example.db")
query = db.session.query(Run) .filter(Run.number.between(50,55) .order_by(desc(Run.number))
runs = query.all() for run in runs: event_count, = (condition.value for condition in run.conditions if condition.name=='event_count')
It works and looks easy. But there is one drawback, each selected run will call one SELECT QUERY to DB to get its conditions. If might be OK for many cases.
=== Raw SQLAlchemy queries ===
What if we want to select runs by conditions value?
First, lets say, that if RCDBProvider gives access to SQLAlchemy session, then it is possible to make use of full power of SQLAlchemy queries.
Lets say, we want to get all runs with '''event_count''' > '''100 000'''
# open database db = rcdb.RCDBProvider("sqlite:///example.db")query = db.session.query(Run).join(Run.conditions).join(Condition.type)
.filter(ConditionType.name == "event_count")
.filter(Condition.int_value > 100 000)
.order_by(Run.number)
print query.count()
print query.first()
print query.all()
What happened here.
By first line:
query = db.session.query(Run).join(Run.conditions).join(Condition.type)
we say, that we would like to select Run objects ('''.query(Run)'''), and also that we will use conditions and condition types ('''.join(Run.conditions).join(Condition.type)''').
Then we filter results (.'''filter(...)''') and ask results to by ordered by Run.number ('''.order_by(Run.number)''')
All these functions (join, filter, order_by, ...) returns Query object, that allows to stack them as many as needed.
Finally, to get the results, one of query.count(), query.first(), query.one() or query.all() is called.
But probably you already feel drawbacks of this approach:
- First, you see that you have to use int_value to filter conditions. That by many means worse than using Condition.value property, that handles type automatically.
- Another drawback is that when you add more logic, the query becomes bulky.
Lets imagine next example. We look for run in range 1000 to 2000 with event_count > 10000, some data_value in range 1.2 and 2.4
query = db.session.query(Run).join(Run.conditions).join(Condition.type)\ .filter(Run.number.between(1000, 2000)\ .filter(((ConditionType.name == "event_count") & (Condition.int_value > 10000)) | ((ConditionType.name == "data_value") & (Condition.float_value.between(1.2, 2.4))))\ .order_by(Run.number)print query.all()
Note that instead of common '''&&''' and '''||''', '''&''' and '''|''' is used. SQLAlchemy overloads this operators to use for comparison.
Note also, that such expressions should be in parentheses. It is possible to use '''or_''' and '''and_''' functions instead, but it doesn't improve the readability.
=== Querying using RCDB helpers ===
RCDB ConditionType provide helpful properties to make querying easier.
# get condition type t = db.get_condition_type("event_count")query = t.run_query.filter(t.value_field > 1000)
print query.all()
What happened?
*'''run_query''' - returns query bootstrap that selects Run objects for given type. So it hides this thing from the raw query above:
....query(Run).join(Run.conditions).join(Condition.type) ... .filter(((ConditionType.name == "event_count")*'''value_field''' - returns the right Condition.xxx_value for a given type. When you put '''t.value_field > 1000''' here, ConditionType '''t''' looked at his '''value_type''' and selected the right Condition.int_value to compare
But there is a limitation. Each condition type should has its own query. But queries can be combined by '''union''' or '''intersect''' methods later.
Lets look at the example, where we fill DB with dummy data and then query for runs using the helper properties. The same example can be found in $RCDB_HOME/python/example_conditions_query.py
# create in memory SQLite database db = rcdb.RCDBProvider("sqlite://") rcdb.model.Base.metadata.create_all(db.engine)event_count_type = db.create_condition_type("event_count", ConditionType.INT_FIELD) data_value_type = db.create_condition_type("data_value", ConditionType.FLOAT_FIELD)
for i in range(0, 100): db.create_run(i) db.add_condition(i, event_count_type, i + 950) #event_count in range 950 - 1049 db.add_condition(i, data_value_type, (i/100.0) + 1) #data_value in 1 - 2
""" Demonstrates ConditionType query helpers""" event_count_type = db.get_condition_type("event_count") data_value_type = db.get_condition_type("data_value")
query = event_count_type.run_query.filter(event_count_type.value_field > 1000).filter(Run.number <=53) print query.all()
query2 = data_value_type.run_query
.filter(data_value_type.value_field.between(1.52, 1.7))
.filter(Run.number < 55)
print query2.all()
print "Results intersect:" print query.intersect(query2).all() print "Results union:" print query.union(query2).all()
The output is:
[, , ] [, , ] Results intersect: [, ] Results union: [, , , ]
More on SQLAlchemy queries in [http://sqlalchemy.readthedocs.org/en/rel_0_9/orm/tutorial.html#querying SQLAlchemy querying tutorial] [http://sqlalchemy.readthedocs.org/en/rel_0_9/orm/query.html SQLAlchemy Query API]
The example is available as python $RCDB_HOME/python/example_conditions_query.py (It creates inmemory database so there is no need in creaty_empty_sqlite.py)
Getting started & basic usage:
- Installation
- Select values tutorial (python)
- Query syntax
- Add data (python)
- CLI Basics
RCDB Explained:
- Connection
- DB and APIs structure
- SQL examples
- Creating condition types
- Adding condition values
- Saving files
- SQLAlchemy
- Logging
- Performance
Command line tools:
DAQ: