Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor the everest run model #9575

Merged
merged 12 commits into from
Dec 20, 2024
Merged

Refactor the everest run model #9575

merged 12 commits into from
Dec 20, 2024

Conversation

verveerpj
Copy link
Contributor

Issue
Refactoring of the everest run model code.

Approach
Short description of the approach

(Screenshot of new behavior in GUI if applicable)

  • PR title captures the intent of the changes, and is fitting for release notes.
  • Added appropriate release note label
  • Commit history is consistent and clean, in line with the contribution guidelines.
  • Make sure unit tests pass locally after every commit (git rebase -i main --exec 'pytest tests/ert/unit_tests -n logical -m "not integration_test"')

When applicable

  • When there are user facing changes: Updated documentation
  • New behavior or changes to existing untested code: Ensured that unit tests are added (See Ground Rules).
  • Large PR: Prepare changes in small commits for more convenient review
  • Bug fix: Add regression test for the bug
  • Bug fix: Create Backport PR to latest release

@verveerpj verveerpj self-assigned this Dec 17, 2024
@verveerpj verveerpj marked this pull request as draft December 17, 2024 13:43
@codecov-commenter
Copy link

codecov-commenter commented Dec 17, 2024

Codecov Report

Attention: Patch coverage is 94.81132% with 11 lines in your changes missing coverage. Please review.

Project coverage is 91.88%. Comparing base (2e9c71a) to head (720f79c).
Report is 2 commits behind head on main.

✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
src/ert/run_models/everest_run_model.py 95.21% 9 Missing ⚠️
src/everest/detached/jobs/everserver.py 87.50% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #9575      +/-   ##
==========================================
- Coverage   91.89%   91.88%   -0.01%     
==========================================
  Files         433      432       -1     
  Lines       26813    26805       -8     
==========================================
- Hits        24639    24629      -10     
- Misses       2174     2176       +2     
Flag Coverage Δ
cli-tests 39.72% <0.00%> (-0.02%) ⬇️
everest-models-test 34.58% <65.09%> (-0.02%) ⬇️
gui-tests 72.09% <0.00%> (-0.07%) ⬇️
integration-test 38.71% <86.79%> (-0.03%) ⬇️
performance-tests 51.91% <0.00%> (-0.03%) ⬇️
test 40.04% <91.98%> (-0.03%) ⬇️
unit-tests 74.16% <21.69%> (+0.02%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@verveerpj verveerpj force-pushed the refactor-everest-run-model branch from 06e3cb9 to 6143dd0 Compare December 17, 2024 14:02
Copy link

codspeed-hq bot commented Dec 17, 2024

CodSpeed Performance Report

Merging #9575 will improve performances by 11.53%

Comparing refactor-everest-run-model (720f79c) with main (ead6423)

Summary

⚡ 1 improvements
✅ 23 untouched benchmarks

Benchmarks breakdown

Benchmark main refactor-everest-run-model Change
test_direct_dark_performance_with_storage[gen_x: 20, sum_x: 20 reals: 10-gen_data-get_record_observations] 1.7 ms 1.5 ms +11.53%

@verveerpj verveerpj force-pushed the refactor-everest-run-model branch 9 times, most recently from 247f315 to 6185ee6 Compare December 18, 2024 09:43
@verveerpj verveerpj marked this pull request as ready for review December 18, 2024 09:43
@verveerpj verveerpj force-pushed the refactor-everest-run-model branch 6 times, most recently from 7142c89 to f989ecf Compare December 18, 2024 13:26
Copy link
Collaborator

@oyvindeide oyvindeide left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the changes looks good, but I think we should split it into different commits, I had a hard time following some of the changes. For example using pattern matching could be a separate commit, I also added some other examples.

config.queue_config,
status_queue,
queue.SimpleQueue(),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Think in the next step we want to use this to forward the events from the base run model, so think it might be better to leave it as it was.

tests/everest/test_yaml_parser.py Show resolved Hide resolved
tests/everest/test_everest_output.py Show resolved Hide resolved
src/ert/run_models/everest_run_model.py Show resolved Hide resolved
@verveerpj
Copy link
Contributor Author

verveerpj commented Dec 18, 2024

I will try to split it up.
Done.

@verveerpj verveerpj force-pushed the refactor-everest-run-model branch 2 times, most recently from 2fbbcac to e18edbf Compare December 18, 2024 15:05
@verveerpj verveerpj marked this pull request as draft December 18, 2024 15:34
@verveerpj verveerpj force-pushed the refactor-everest-run-model branch 4 times, most recently from fab8972 to 4a7a667 Compare December 18, 2024 18:14
Path(everest_config.log_dir).mkdir(parents=True, exist_ok=True)
Path(everest_config.optimization_output_dir).mkdir(parents=True, exist_ok=True)

assert everest_config.environment is not None
logging.getLogger(EVEREST).info(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this logging maybe also be moved to the environment config as it relates to the seed? Semi-related but I think the two logging statements could be merged into one, if it seems sensible to do in this PR

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the logging is appropriate here, since it is basically announcing that we are going to run optimization with e given seed. We could move it to the config, then it would be more like: logging that we modified the configuration to fill in a defaulted value. We generally don't seem to do that.

I will merge them.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -703,3 +716,42 @@ def _handle_errors(
self._fm_errors[error_hash]["ids"].append(fm_id)
error_id = self._fm_errors[error_hash]["error_id"]
fm_logger.error(err_msg.format(error_id, ""))


class SimulatorCache:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a general question maybe not related to this PR: Do we still need a simulator cache? It stores some controls and resulting constraints/objectives. I guess also when we have everest storage we could get it from there instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we do need it, some optimizers may repeatedly ask for the same value. It would be a good idea to get is somehow from storage. Let's look into that when the storage has stabilized.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm okay, this is probably scoped for another issue/PR, but:

Q1: It seems kind of general, lru_cache around a single method could maybe do the same thing as this?

Q2: Also we use the control vector to do a lookup, guessing there is some kind of ID that lets us just give that instead of having to look up on controls? Also the kind of realization that goes in is interesting, if we need to have a set of controls it implies we are searching through perturbations of [one realization/geoid that has multiple perturbations, that are in ERT, just more realizations]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The controls are vectors of float values, so not hashable, hence lru_cache is not straigtforward to use. They do not have some kind of hashable ID, I think numpy arrays don't have that.

For each control vector we will have multiple evaluations, i.e. the different realizations, hence the additional use of the realization names in the cache.


def add(
self,
realization_id: int,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what kind of realization is realization_id? Would comment or make more explicit name if possible

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed: realization_id -> realization_name

Copy link
Contributor

@yngve-sk yngve-sk Dec 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the ERT codebase otherwise it is mostly called realization, maybe we shouldn't add yet another synonym, and just use what is mostly used already? There is also iens but imo that is a bit cryptic 😅 Unless this is specific to Everest somehow?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reason I used name is that they are strings, and I also use sometimes indices to realizations. I will rename things differently, making sure indices are names appropriately, and go to realizations for the names.

@verveerpj verveerpj force-pushed the refactor-everest-run-model branch 5 times, most recently from 8fc0602 to 3cde5a4 Compare December 19, 2024 17:21
@verveerpj verveerpj force-pushed the refactor-everest-run-model branch 2 times, most recently from fcc8ac8 to d483532 Compare December 20, 2024 10:56
@verveerpj verveerpj force-pushed the refactor-everest-run-model branch from d483532 to 7318071 Compare December 20, 2024 10:56
@@ -775,11 +781,9 @@ def add(
)

def get(
self, realization_id: int, controls: NDArray[np.float64]
self, realization: int, controls: NDArray[np.float64]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because there was some confusion regarding this, would perhaps to add a docstring to this to describe what realization means here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

^agree

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@oyvindeide
Copy link
Collaborator

I had one final small comment, but other than that, looks good from my side 👍 Thanks for splitting up, made it much easier to read through for me at least. Just keep the history as is and dont squash on merge.

@verveerpj verveerpj force-pushed the refactor-everest-run-model branch from 7318071 to 720f79c Compare December 20, 2024 12:39
Copy link
Contributor

@yngve-sk yngve-sk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 💯

@verveerpj verveerpj merged commit 722e0fc into main Dec 20, 2024
42 checks passed
@verveerpj verveerpj deleted the refactor-everest-run-model branch December 20, 2024 13:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

5 participants