Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bugfix/use status log #38

Merged
merged 13 commits into from
May 3, 2016
Merged

Bugfix/use status log #38

merged 13 commits into from
May 3, 2016

Conversation

mnlevy1981
Copy link
Collaborator

This branch resolves issue #23 -- marbl_interface_types.F90 now provides output to the MARBL status log rather than writing to stdout (allowing errors to be trapped and returned to the GCM as well).

Ran the aux_pop_MARBL test suite, and all 9 tests were bit-for-bit with MARBL0.0.0

Working towards solution to issue #23 -- errors returned from diagnostic init
are now added to the status log instead of printed to stdout.
When adding forcing fields, errors are logged properly rather than printed to
stdout. Also cleaned up some of the diagnostic logging from the prior commit.

This should let us close issue #23
@klindsay28
Copy link
Collaborator

For src/marbl_diagnostics_mod.F90, github states

1,805 additions, 227 deletions not shown because the diff is too large. Please use a local Git client to view these changes.

I don't know how to do this.

@klindsay28
Copy link
Collaborator

The string "error code returned" confuses me. When I see that, I'm expecting to see a code, but there is none. Perhaps I'm not understanding the semanatics of marbl_logging.F90 or how it is to be understood. Unfortunately, examining the source code of marbl_logging.F90 is helping me out. It would be useful if there were documentation at the top of marbl_logging.F90 describing usage.

@mnlevy1981
Copy link
Collaborator Author

@klindsay28 -- to see all the diffs locally, do the following:

  1. Add my fork as a remote to your local clone of MARBL

    $ git remote add mlevy [email protected]:mnlevy1981/MARBL.git

  2. Make all the branches / tags from the mlevy remote available locally (this will only update your .git/ directory, it will not change any of the files outside that directory; i.e. it will not alter any of the files in your src/ directory)

    $ git fetch mlevy

  3. Compare the revision I branched from (bce81dc) to my mlevy/bugfix/use_status_log branch

    $ git diff bce81dc mlevy/bugfix/use_status_log


Instead of "error code returned ..." I can change the text to "error reported from [module]". I wasn't sure of the best way to word the messages, but consider the situation where the GCM calls MARBL_A(), which calls MARBL_B(), which calls MARBL_C(), which reports an error.

Ideally, the message log would show the following (in this order):

MARBL_C: [description of error]
MARBL_B: [tell user error came from call to MARBL_C()]
MARBL_A: [tell user error came from call to MARBL_B()]
GCM: [tell user error came from call to MARBL_A()]

The GCM would write the log to stdout or a file and then call MPI_Abort(). Like I said, I wasn't sure of the best way to actually word the error messages and am happy to change the text.

I'm happy to add documentation to marbl_logging.F90 describing this usage.

@klindsay28
Copy link
Collaborator

I'm concerned about using a module variable in marbl_logging.F90 as a temporary for message creation. It smells like a potential thread race condition waiting to bite us. Our calls to marbl_init, we are calling it for each thread, are in an iblock loop in ecosys_driver_init. If someone were to put on openmp do directive around the loop, say to speed things up, I think we'd intermittently be in trouble.

Would it be better for users of marbl_logging to either use their own temp string, or to move the temp string into one of the marbl_logging data types?

@mnlevy1981
Copy link
Collaborator Author

I didn't think about the threading / race issues, but that's a great point. The safest fix would be to make error_msg and status_msg local variables in every routine that uses the status log. I'll probably just use one variable (log_msg or log_message) and pass it to both log_error() and log_noerror().

Resolved one conflict -- was using marbl_logging for both this branch and the
surface_forcing_output branch (which was recently merged), but had slightly
different formatting between the two.
As Keith pointed out, having these module variables creates the potential for a
race condition if multiple OpenMP threads in the GCM are writing log messages
at the same time. Every routine that calls log_error() or log_noerror() now has
a local variable named "log_message" that we write messages into before calling
one of the two aforementioned routines.
I created a new subroutine to standardize the error traceback in MARBL. When
first encountering an error, call LOG%log_error() and return; every subsequent
call should be to LOG%log_error_trace, which takes the name of the routine that
produced the error as an argument rather than a log message.

Note that this is just a wrapper script - log_error_trace creates the log
message

"Error reported from [routine name]"

and then calls LOG%log_error with that message.

I say "first pass" because I still need to adapt the errors from
add_diagnostics() and add_forcing_field(), where I want to include the name of
the diagnostic [or forcing] field that produced the error. I was originally
thinking that I should add an optional argument for field name, but instead I
might treat add_forcing_field([FIELDNAME]) as the routine name.
I created a wrapper routine in marbl_diagnostics_mod that calls
LOG%log_error_trace when labort_marbl is detecting from an add_diags call.
Something similar should be used in marbl_mod for the forcing fields.
@mnlevy1981
Copy link
Collaborator Author

This is what the log looks like when I make num_forcing_diags too small:

 MARBL ERROR (marbl_interface_types:marbl_diagnostics_add): not enough memory al
 located for this number of diagnostics!
 MARBL ERROR (marbl_diagnostics_mod:marbl_diagnostics_init): Error reported from
  diags%add_diagnostic(O2SAT)
 MARBL ERROR (marbl_interface:marbl_init): Error reported from marbl_diagnostics
 _init()
 MARBL ERROR (ecosys_driver:ecosys_driver_init): error code returned from marbl(
 1)%init()

Obviously I need to update the "error code returned" message in the driver, but otherwise I think this is a pretty informative set of messages.

This commit only affects the comments in .F90 files, but there is now a header
in marbl_logging.F90 that explains how marbl_instance%StatusLog works and how
the GCM should interact with it. Also, there was a "TODO" comment in
marbl_interface_types.F90 that had already been done, so I deleted it.
Much like the diagnostics in e257a8c there is now a routine in marbl_mod to
handle the log_error_trace calls when forcing_fields%add_forcing_field()
returns an error.
@mnlevy1981
Copy link
Collaborator Author

Add when I make num_surface_forcing_fields too small:

 MARBL ERROR (marbl_interface_types:marbl_forcing_fields_add): not enough memory
  allocated for number of forcing fields!
 MARBL ERROR (marbl_mod:marbl_sflux_forcing_fields_init): Error reported from fo
 rcing_fields%add(xco2_alt_co2)
 MARBL ERROR (marbl_interface:marbl_init): Error reported from marbl_init_surfac
 e_forcing_fields()
 MARBL ERROR (ecosys_driver:ecosys_driver_init): error code returned from marbl(
 1)%init()

Again, I need to clean up the error message coming from the driver.

@klindsay28 -- at this point, I think I've addressed everything you've brought up today. I also brought this branch up to date with master (which was updated when I brought in the surface_forcing_output branch), so to see the full set of diffs you need to run

$ git fetch mlevy
$ git diff mlevy/master mlevy/bugfix/use_status_log

If there's more work to be done on this branch, I'll put it off until after I finish the tracer index clean-up... but I think this is pretty close.

I was trying to write a message combining strings and integers using the "(4A)"
formatting string, which needed to be changed to "(A,I0,A,I0)"
@klindsay28
Copy link
Collaborator

subname in subroutines of marbl_co2calc_mod.F90 are not consistent with the full module name and/or the subroutine name, or how many colons are used to separate the module name from the subroutine name. Suggested changes are:

  • character(*), parameter :: subname = 'marbl_co2calc:marbl_co2calc_surf'
  • character(*), parameter :: subname = 'marbl_co2calc_mod:marbl_co2calc_surf'
  • character(*), parameter :: subname = 'marbl_co2calc:marbl_comp_CO3terms'
  • character(*), parameter :: subname = 'marbl_co2calc_mod:marbl_comp_CO3terms'
  • character(*), parameter :: subname = 'marbl_co2calc:marbl_comp_htotal'
  • character(*), parameter :: subname = 'marbl_co2calc_mod:comp_htotal'
  • character(*), parameter :: subname = 'marbl_co2calc::drtsafe'
  • character(*), parameter :: subname = 'marbl_co2calc_mod:drtsafe'

I now see that this is the case in numerous other instances across MARBL source code. Running the command
grep -Ei 'subname =|^ *subroutine|^ *function' src/.F90 | grep -B1 'subname *='
picks up problems in marbl_restore_mod.F90, marbl_mod.F90, marbl_diagnostics_mod.F90, and marbl_ciso_mod.F90

Perhaps fixing these mismatches in files not already changed on this branch could be postponed to another tag/checkin. Perhaps a bug should be filed for this.

@@ -1,5 +1,51 @@
module marbl_logging
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Notes from a conversation with @klindsay28

  1. Documentation at top of routine would be more readable in a bullet-point format
  • Routines used to write log entries
  • Routines used by the driver
    -- Include psuedo-code for marl_log_print!
  1. Documentation in marbl_log_[noerror|error|error_trace] should define CodeLoc and whatnot
    "CodeLoc is the name of the subroutine that has called marbl_log_error_trace, RoutineName is the name of the routine that was called by CodeLoc and now labort_marbl is true"

  2. horizontal rule between subroutines (!====================)

  3. Need better documentation on OutputOptions (mention that it is included in the code as a hook for future development), remove namelist reference from comment at top of marbl_log_error

@klindsay28
Copy link
Collaborator

everything else looks fine to me
FYI, I learned from google that adding ?w=1 to the URL when examining files changed will show the diff with whitespace ignored (it mostly works in my experience)

The parameter subname appears in many subroutines, and should be declared as

character(*), parmameter :: subname = '[module name]:[subroutine name]'

This was not being done consistently throughout the code.
Per the comments in Pull Request #38, I did the following:

1. Used more stream-lined approach to documenting module use
2. Included basic print_marbl_log code
3. Defined the various strings that are intent(in) to the different marbl_log_*
   routines
4. Added a horizontal rule between subroutines
5. Added a brief blurb about the OutputOptions variable, mentioning that it is
   a place-holder for future development.
@mnlevy1981
Copy link
Collaborator Author

Note that this update also requires an update to ecosys_driver.F90 (which currently uses two variables from marbl_logging that have been removed from the module); I realize that I haven't committed those changes to a branch of POP yet, so I need to do that and then I'll post a link with the POP changes + test results.

@mnlevy1981
Copy link
Collaborator Author

The POP branch needed to run with this MARBL tag is

https://svn-ccsm-models.cgd.ucar.edu/pop2/branch_tags/marbl_dev_status_log_tags/marbl_dev_status_log_n01_marbl_dev_n03_cesm_pop_2_1_20160407/

on yellowstone_intel, all aux_pop and aux_pop_MARBL passed and also passed baseline comparisons (using cime 4.4.8 and cice 20160309; baseline POP was the March 17 trunk tag)

Added visual spacer between subroutines and made whitespace consistent
with previous commit.
@mnlevy1981 mnlevy1981 merged commit 29dc48a into marbl-ecosys:master May 3, 2016
@mnlevy1981 mnlevy1981 deleted the bugfix/use_status_log branch May 3, 2016 18:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants