-
Notifications
You must be signed in to change notification settings - Fork 4
Basic Examples
These examples are focussed on getting you started with simple operations such as opening one or more datasets through the Session
object, reading and writing fields, and how to manage your datasets when working with ExeTera
Creating a Session
object can be done multiple ways, but we recommend that you wrap the session in a context manager (with
statement). This allows the Session object to automatically manage the datasets that you have opened, closing them all once the with
statement is exited.
Opening and closing datasets is very fast. When working in jupyter notebooks or jupyter lab, please feel free to create a new Session
object for each cell.
from hystore.core.session import Session
# recommended
with Session() as s:
...
# not recommended
s = Session()
Once you have a session, the next step is typically to open a dataset. Datasets can be opened in one of three modes:
- read - the dataset can be read from but not written to
- append - the dataset can be read from and written to
- write - a new dataset is created (and will overwrite an existing dataset with the same name)
with Session() as s:
ds1 = s.open_dataset('/path/to/my/first/dataset/a_dataset.hdf5', 'r', 'ds1')
ds2 = s.open_dataset('/path/to/my/second/dataset/another_dataset.hdf5', 'r+', 'ds2')
Closing a dataset is done through Session.close_dataset, as follows
with Session() as s:
ds1 = s.open_dataset('/path/to/dataset.hdf5', 'r', 'ds1')
# do some work
...
s.close_dataset('ds1')
A typical ExeTera datastore is partitioned into logical tables representing a collection of closely related fields. This can be viewed as being very similar to a SQL data table or a pandas DataFrame.
with Session() as s:
ds = s.open_datset('/path/to/dataset.hdf5', 'r', 'ds')
# list the tables present in this dataset
for k in ds.keys():
print(k)
The field being loaded must represent a valid field (see Concepts)
r = session.get(dataset['patient_id'])
print(len(r)) # writes out the length of the field without loading the values
r = session.get(dataset['patient_id'])
values = r.data[:]
patients = dataset['patients']
timestamp = datetime.now(timezone.utc)
isf = session.create_indexed_string(patients, 'foo')
fsf = session.create_fixed_string(patients, 'bar', 10)
csf = session.create_categorical(patients, 'boo', {'no':0, 'maybe':2, 'yes':1})
nsf = session.create_numeric(patients, 'far', 'uint32')
tsf = session.create_timestamp(patients, 'foobar')
for c in chunks_from_somewhere:
field.write_part(c)
field.flush()
field.write(generate_data_from_somewhere())
Most of the session functions accept various representations of fields. The ones that are a bit more restrictive will be made more flexible in future releases. The following calls to apply_index are equivalent.
index = index_from_somewhere()
raw_foo = a_numpy_array_from_somewhere()
result = session.apply_index(index, src['foo']) # refer to the hdf5 Group that represents the field
result = session.apply_index(index, session.get(src['foo']) # refer to the field
result = session.apply_index(index, session.get(src['foo'].data[:]) # refer to the data in the field
result = session.apply_index(index, raw_foo) # refer to a numpy array