Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Split ERT 3 parameters into separate records #1746

Merged
merged 1 commit into from
Jun 28, 2021

Conversation

pinkwah
Copy link
Contributor

@pinkwah pinkwah commented Jun 8, 2021

Issue
Resolves #1720

Todo

  • Write test which uploads a set of parameters and then checks that it is correct
  • Check that all realisations inside a EnsembleRecord contain the same index and contain valid data
  • Check that all parameter EnsembleRecords are of the Mapping[str, Numeric] type rather than Iterable[Numeric], which is another alternative. We handle this case too now.

@pinkwah
Copy link
Contributor Author

pinkwah commented Jun 8, 2021

image

ert3/storage/_storage.py Outdated Show resolved Hide resolved
ert3/storage/_storage.py Outdated Show resolved Hide resolved
@pinkwah pinkwah changed the base branch from master to main June 9, 2021 14:09
@@ -47,4 +47,5 @@ def sample_record(
record_name=record_name,
ensemble_record=ensrecord,
experiment_name=experiment_name,
is_parameter=True,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this have to leak out here? Is this not double configuration in the sense that the record is already registered as a parameter upon initialization of experiment?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ERT Storage is informed of the record containing parameters only once. So in that sense it's not a double configuration.

The alternative is to ask ERT Storage again, which is adds an unnecessary request, or to store the information in a global variable. We already know it's a parameter at this point, so being specific at this point is fine. Also we avoid a O(log n) to determine if it's a parameter.

Copy link
Contributor

@xjules xjules Jun 10, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with Zohar, but also understand that is_parameter seems like providing the same information twice. What about making it more general specifying is_compound=True (my favorite word these days! :) ) or just split=True, meaning this record becomes split into split into several ones.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like the word "compound" wrt. parameters since it creates this separation between "parameter" and "compound parameter". We don't need this complexity. What you're calling a "compound parameter" I call a "parameter record", ie. a record that contains one or more parameters.

In my naming convention, both ERT2 and ERT3 publish "parameter records". ERT3 contains multiple parameters within the records, while ERT2 only contains 1 per record. ERT Storage only lets you fetch whole "parameter records". If you want to fetch a single parameter, we require you to fetch the entire record and select the column you want.

Using "compound parameter" kind of makes it sound like ERT Storage needs to consider "parameter" records and "compound parameter" records as separate types, which separate logic for each.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two comments here :)

First, there is a discrepancy between the notion of a record and a parameter in ert3 and ert-storage. The responsibility of handling this discrepancy should reside entirely on ert3.storage. Afterwards, when we have working software, we should align on the language and definition of a record and parameter. But as long as within the ert3 module (besides ert3.storage), the ert3.data.Record should serve as the only definition of what a record is. And there there is no "compound" nor "parameter" record present there.

Second, upon initialising the experiment ert3 is passing the list of names of all the parameters (or now a dict also containing the keys). That contains the information needed and since the notion of a parameter is not really well-defined in ert3 I would prefer to keep it as much as possible out of the implementation. In particular, it is only used in the export code for now... Also, if we move in the direction currently suggested in #1731 this distinction between a "parameter" or not a "parameter" should disappear again very soon. And then it is also nice to keep the current introduction of the concept as local as possible (currently only for initialising experiment and export).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The distinction between what a parameter and non-parameters will still be an issue even with #1731 . In fact, ERT Storage already uses a similar format for matrix data (what you call "numerical data"). This distinction is very relevant for webviz-ert, which needs to have logic for visualising parameters, which necessitates specifying which records contain parameter data (and then also tying them to a parameter prior distribution if it exists).

ert3.storage.init_experiment(
workspace=workspace_root,
experiment_name=experiment_name,
parameters=parameter_names,
parameters=parameters,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we call it parameter_layout or similar?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Layout isn't descriptive (ie, the word doesn't mean anything by itself, except in GUIs). parameter_records?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

parameter_variables?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to ert3 each element of the structure is providing the name of a record, together with the index of that particular record. That is the currently used notation within ert3 and I think we should stick with that until we agree on something else ™️

record_index makes sense, parameter_index can make sense. parameter or record followed by structure, layout, blueprint, schema or a similar term might also make sense...

) -> None:
if experiment_name is None:
experiment_name = f"{workspace}.{_ENSEMBLE_RECORDS}"

_add_numerical_data(workspace, experiment_name, record_name, ensemble_record)
if is_parameter:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be nice to see whether we could unify the code path for parameters and not parameters a bit more. But that can be left as a refactoring exercise when everything is working :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The way we upload parameters and non-parameters is big in part due to the concrete issue this PR is solving, and equinor/ert-storage#126 .

return _get_numerical_data(workspace, experiment_name, record_name)
param_names = _get_experiment_parameters(workspace, experiment_name)
if record_name in param_names:
ensemble_records = [
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would change name here, like for instance parameter_instance, record_instance, record_element in order to be clear that we want to build the record from elements

@pinkwah pinkwah force-pushed the ert3-split-params branch from 0495b10 to 585bb0b Compare June 10, 2021 13:29
@pinkwah pinkwah self-assigned this Jun 11, 2021
@pinkwah pinkwah force-pushed the ert3-split-params branch 4 times, most recently from 5f6c64b to 93ab252 Compare June 15, 2021 11:13
@pinkwah pinkwah marked this pull request as ready for review June 15, 2021 11:14
ert3/storage/_storage.py Outdated Show resolved Hide resolved
ert3/storage/_storage.py Outdated Show resolved Hide resolved
@pinkwah pinkwah force-pushed the ert3-split-params branch 5 times, most recently from 43d2b7c to 306e35d Compare June 18, 2021 12:27
Copy link
Contributor

@markusdregi markusdregi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is starting to look good! 👏🏻

indices: Set[Union[str, int]] = set()
for record in ensemble_record.records:
assert record.index is not None
indices |= set(record.index)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think set.extend reads nicer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

set has no extend. The alternative is indices = indices.union(set(record.index)) which I decided against.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I meant update 🤷🏻

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated 👍

parameters_config: ert3.config.ParametersConfig,
) -> List[str]:
if record_source[0] == "storage":
assert len(record_source) == 2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we move source and name extraction to the top of the function body instead of indexing record_source various places?

parameters: Dict[str, List[str]] = {}
for input_record in ensemble.input:
record_name = input_record.record
record_source = input_record.source.split(".")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't have to do it in this PR, but I think it is time to create a delimiter variable for this. It is now present in the ert3.storage module as well. Could you create an issue if you leave it unresolved?

json={
"parameter_names": list(parameters),
"parameter_names": [
f"{record}.{param}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ref the delimiter comment

) -> ert3.data.Record:
def _response2records(
response_content: bytes, record_type: ert3.data.RecordType
) -> List[ert3.data.Record]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is exactly what the EnsembleRecord type is to represent, perhaps we should return that instead of just the list?


return _get_numerical_data(workspace, experiment_name, record_name)
param_names = _get_experiment_parameters(workspace, experiment_name)
if record_name in param_names:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would make sense to separate this entire if-clause into a separate function...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I separated only the combining part.



def get_ensemble_record_names(
*, workspace: Path, experiment_name: Optional[str] = None
*, workspace: Path, experiment_name: Optional[str] = None, _flatten: bool = True
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we sure we want to expose this implementation detail only for the purpose of testing? 🤔

@pinkwah pinkwah force-pushed the ert3-split-params branch 4 times, most recently from 422ed10 to dec2f37 Compare June 21, 2021 11:26
Copy link
Contributor

@markusdregi markusdregi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please address my comment on the hanging ".". Besides that this looks really good 🦾

raise ert3.exceptions.StorageError(response.text)
parameters: MutableMapping[str, List[str]] = {}
for name in response.json():
key, val = name.split(".")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Delimiter?

@markusdregi markusdregi force-pushed the ert3-split-params branch 2 times, most recently from d26c243 to dec2f37 Compare June 24, 2021 07:07
@markusdregi markusdregi self-assigned this Jun 24, 2021
@pinkwah
Copy link
Contributor Author

pinkwah commented Jun 28, 2021

Jenkins test this please

@markusdregi markusdregi merged commit 2683d06 into equinor:main Jun 28, 2021
@pinkwah pinkwah deleted the ert3-split-params branch February 15, 2022 08:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Make it possible to visualise ert3 parameters using webviz-ert
3 participants