-
Notifications
You must be signed in to change notification settings - Fork 7
Select runs and get values
Better use select_values. See below why
Contents:
For the most of the use cases the new select_values function is faster and better than select_runs
.
-
faster, while using python
if
statement syntaxis for search queries,select_values
relies on SQL search query as much as possible and do only the final selection steps in python. Moreover, it selects only required values, increasing the performance. Theselect_runs
function contraty to that does a lot of filtering on python side and pulls full runs information. -
beter output,
select_values
selects the resulting table of values for specified runs. The table of runs and values is what is needed in 99% of cases.select_runs
returns a list of Run SQLAlchemy objects. Manipulating over Run ORM objects usually leads to more queries. - better introspection, results of select_values also has performance metrics and some other goodies. Not that it is needed for regular users, but can help to figure out why something is slow.
To experiment with the examples on this page, one can download daily recreated SQLite database: https://halldweb.jlab.org/dist/rcdb.sqlite
Using connection string:
sqlite:///<path to file>/rcdb.sqlite
Or connect to readonly mysql:
mysql://[email protected]/rcdb
Suppose, one wants to get all event_count-s and beam_current-s for production runs:
import rcdb
# Connect to database
db = rcdb.RCDBProvider("mysql://[email protected]/rcdb")
# Select runs and get values
table = db.select_runs("@is_production")\
.get_values(['event_count', 'beam_current'], insert_run_number=True)
print table
As the result one gets something like:
[
[1023, 3984793, 0.145]
[1024, 4569873, 0.230]
...
]
The first column is a run number (we set insert_run_number=True
above). The other two columns are 'event_count' and 'beam_current' as we gave it above.
If run number is not needed insert_run_number
may be skipped:
table = db.select_runs("@is_production")
.get_values(['event_count', 'beam_current'])
A nice way to iterate the values:
for row in table:
event_count, beam_current = tuple(row)
print event_count, beam_current
If one wants to apply a run range, say for a particular run period:
table = db.select_runs("@is_production", 10000, 20000)\
.get_values(['event_count', 'beam_current'], True)
To get values for all runs without filtration a search pattern may be skipped:
table = db.select_runs(run_min=10000, run_max=20000)\
.get_values(['event_count', 'beam_current'], insert_run_number=True)
(note that parameter names are used here, so the python could figure function parameters out)
The table is always sorted by run number. It is just a 'feature' of getting runs DB query (that is under the hood). However, the order in with run numbers are sorted could be changed:
table = db.select_runs(run_min=10000, run_max=20000, sort_desc=True)\
.get_values(['event_count', 'beam_current'], insert_run_number=True)
sort_desc=True
- makes rows to be sorted by descending run_number
select_runs
function returns RunSelectionResult
object that contains all selected runs and some other information about how the runs where selected. The RunSelectionResult implements list
interface returning ```Run`-s. Thus one can do:
import rcdb
db = rcdb.RCDBProvider("mysql://[email protected]/rcdb")
result = db.select_runs("@is_production")
for run in result:
print run.number
As one could guess the selected run numbers are printed as the result.
Run
has the next useful functions:
def get_conditions_by_name(self):
# Get all conditions and returns dictionary of condition.name -> condition
def get_condition(self, condition_name):
# Gets Condition object by name if such name condition exist or None
def get_condition_value(self, condition_name):
# Gets the condition value if such condition exist or None
So one can iterate selected runs and get any desired condition:
import rcdb
db = rcdb.RCDBProvider("mysql://[email protected]/rcdb")
result = db.select_runs("@is_production")
for run in result:
print run.get_condition_value('event_count')
In the performance point of view, the fastest way to get values by using
db.select_runs(...).get_values(...)
Because get_values
makes just a single database call to retrieve all values for selected runs.
In case of iterating:
result = db.select_runs("@is_production")
for run in result:
print run.get_condition_value('event_count')
Database is queried on each get_condition_value
Getting started & basic usage:
- Installation
- Select values tutorial (python)
- Query syntax
- Add data (python)
- CLI Basics
RCDB Explained:
- Connection
- DB and APIs structure
- SQL examples
- Creating condition types
- Adding condition values
- Saving files
- SQLAlchemy
- Logging
- Performance
Command line tools:
DAQ: