-
Notifications
You must be signed in to change notification settings - Fork 108
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Split ERT 3 parameters into separate records #1746
Conversation
ert3/engine/_record.py
Outdated
@@ -47,4 +47,5 @@ def sample_record( | |||
record_name=record_name, | |||
ensemble_record=ensrecord, | |||
experiment_name=experiment_name, | |||
is_parameter=True, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does this have to leak out here? Is this not double configuration in the sense that the record is already registered as a parameter upon initialization of experiment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ERT Storage is informed of the record containing parameters only once. So in that sense it's not a double configuration.
The alternative is to ask ERT Storage again, which is adds an unnecessary request, or to store the information in a global variable. We already know it's a parameter at this point, so being specific at this point is fine. Also we avoid a O(log n) to determine if it's a parameter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree with Zohar, but also understand that is_parameter
seems like providing the same information twice. What about making it more general specifying is_compound=True
(my favorite word these days! :) ) or just split=True
, meaning this record becomes split into split into several ones.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't like the word "compound" wrt. parameters since it creates this separation between "parameter" and "compound parameter". We don't need this complexity. What you're calling a "compound parameter" I call a "parameter record", ie. a record that contains one or more parameters.
In my naming convention, both ERT2 and ERT3 publish "parameter records". ERT3 contains multiple parameters within the records, while ERT2 only contains 1 per record. ERT Storage only lets you fetch whole "parameter records". If you want to fetch a single parameter, we require you to fetch the entire record and select the column you want.
Using "compound parameter" kind of makes it sound like ERT Storage needs to consider "parameter" records and "compound parameter" records as separate types, which separate logic for each.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two comments here :)
First, there is a discrepancy between the notion of a record and a parameter in ert3 and ert-storage. The responsibility of handling this discrepancy should reside entirely on ert3.storage
. Afterwards, when we have working software, we should align on the language and definition of a record and parameter. But as long as within the ert3
module (besides ert3.storage
), the ert3.data.Record
should serve as the only definition of what a record is. And there there is no "compound" nor "parameter" record present there.
Second, upon initialising the experiment ert3
is passing the list of names of all the parameters (or now a dict also containing the keys). That contains the information needed and since the notion of a parameter is not really well-defined in ert3
I would prefer to keep it as much as possible out of the implementation. In particular, it is only used in the export
code for now... Also, if we move in the direction currently suggested in #1731 this distinction between a "parameter" or not a "parameter" should disappear again very soon. And then it is also nice to keep the current introduction of the concept as local as possible (currently only for initialising experiment and export).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The distinction between what a parameter and non-parameters will still be an issue even with #1731 . In fact, ERT Storage already uses a similar format for matrix data (what you call "numerical data"). This distinction is very relevant for webviz-ert, which needs to have logic for visualising parameters, which necessitates specifying which records contain parameter data (and then also tying them to a parameter prior distribution if it exists).
ert3.storage.init_experiment( | ||
workspace=workspace_root, | ||
experiment_name=experiment_name, | ||
parameters=parameter_names, | ||
parameters=parameters, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we call it parameter_layout
or similar?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Layout isn't descriptive (ie, the word doesn't mean anything by itself, except in GUIs). parameter_records
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
parameter_variables
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to ert3
each element of the structure is providing the name of a record, together with the index of that particular record. That is the currently used notation within ert3
and I think we should stick with that until we agree on something else ™️
record_index
makes sense, parameter_index
can make sense. parameter
or record
followed by structure, layout, blueprint, schema or a similar term might also make sense...
ert3/storage/_storage.py
Outdated
) -> None: | ||
if experiment_name is None: | ||
experiment_name = f"{workspace}.{_ENSEMBLE_RECORDS}" | ||
|
||
_add_numerical_data(workspace, experiment_name, record_name, ensemble_record) | ||
if is_parameter: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be nice to see whether we could unify the code path for parameters and not parameters a bit more. But that can be left as a refactoring exercise when everything is working :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The way we upload parameters and non-parameters is big in part due to the concrete issue this PR is solving, and equinor/ert-storage#126 .
return _get_numerical_data(workspace, experiment_name, record_name) | ||
param_names = _get_experiment_parameters(workspace, experiment_name) | ||
if record_name in param_names: | ||
ensemble_records = [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would change name here, like for instance parameter_instance
, record_instance
, record_element
in order to be clear that we want to build the record
from elements
0495b10
to
585bb0b
Compare
5f6c64b
to
93ab252
Compare
43d2b7c
to
306e35d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is starting to look good! 👏🏻
ert3/engine/_run.py
Outdated
indices: Set[Union[str, int]] = set() | ||
for record in ensemble_record.records: | ||
assert record.index is not None | ||
indices |= set(record.index) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think set.extend
reads nicer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
set
has no extend
. The alternative is indices = indices.union(set(record.index))
which I decided against.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I meant update
🤷🏻
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated 👍
ert3/engine/_run.py
Outdated
parameters_config: ert3.config.ParametersConfig, | ||
) -> List[str]: | ||
if record_source[0] == "storage": | ||
assert len(record_source) == 2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we move source
and name
extraction to the top of the function body instead of indexing record_source
various places?
ert3/engine/_run.py
Outdated
parameters: Dict[str, List[str]] = {} | ||
for input_record in ensemble.input: | ||
record_name = input_record.record | ||
record_source = input_record.source.split(".") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You don't have to do it in this PR, but I think it is time to create a delimiter variable for this. It is now present in the ert3.storage
module as well. Could you create an issue if you leave it unresolved?
json={ | ||
"parameter_names": list(parameters), | ||
"parameter_names": [ | ||
f"{record}.{param}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ref the delimiter comment
ert3/storage/_storage.py
Outdated
) -> ert3.data.Record: | ||
def _response2records( | ||
response_content: bytes, record_type: ert3.data.RecordType | ||
) -> List[ert3.data.Record]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is exactly what the EnsembleRecord
type is to represent, perhaps we should return that instead of just the list?
|
||
return _get_numerical_data(workspace, experiment_name, record_name) | ||
param_names = _get_experiment_parameters(workspace, experiment_name) | ||
if record_name in param_names: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would make sense to separate this entire if
-clause into a separate function...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I separated only the combining part.
|
||
|
||
def get_ensemble_record_names( | ||
*, workspace: Path, experiment_name: Optional[str] = None | ||
*, workspace: Path, experiment_name: Optional[str] = None, _flatten: bool = True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we sure we want to expose this implementation detail only for the purpose of testing? 🤔
422ed10
to
dec2f37
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please address my comment on the hanging "."
. Besides that this looks really good 🦾
raise ert3.exceptions.StorageError(response.text) | ||
parameters: MutableMapping[str, List[str]] = {} | ||
for name in response.json(): | ||
key, val = name.split(".") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Delimiter?
d26c243
to
dec2f37
Compare
dec2f37
to
aa7119e
Compare
aa7119e
to
bc61f89
Compare
Jenkins test this please |
Issue
Resolves #1720
Todo
EnsembleRecord
contain the sameindex
and contain valid dataCheck that all parameterWe handle this case too now.EnsembleRecord
s are of theMapping[str, Numeric]
type rather thanIterable[Numeric]
, which is another alternative.