Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider allowing checkout_externals to run even if in-sync components are modified #112

Open
billsacks opened this issue Jul 23, 2018 · 11 comments

Comments

@billsacks
Copy link
Member

Currently, checkout_externals refuses to do any updates if any components are modified ('M' in the second column) – even components that are in-sync. A number of people have questioned the (overly?) conservative nature of this check, and I have seen at least one instance where this check has caused trouble for CESM users (they had some cime changes in their sandbox, but wanted to use checkout_externals to bring in a new version of cism, along with the new version of cism's sub-external, source_cism).

I feel it would be reasonable to allow checkout_externals to proceed in the presence of modifications ('M' state), as long as the out-of-sync externals are unmodified. However, perhaps a warning should be issued in this case?

Some aspects of this that might need some careful thought, though, are related to sub-externals:

  1. What if a component is in-sync but has modified its sub-externals cfg file? Do we read from the modified sub-externals cfg file or abort in this case?

  2. What if a component, 'foo', is out-of-sync, and one of its sub-externals, 'bar', is in-sync but modified? Bringing 'foo' into sync could involve updating its version of 'bar', at which point we'd have an external ('bar') that is both modified and out-of-sync. The safe thing to do would be to abort if, for a given external that is out-of-sync, either that external itself or any of its sub-externals are modified; I'm not sure whether there is an easy way to check that condition currently.

@billsacks
Copy link
Member Author

This desire, or something like it, is coming up again in the context of ESCOMP/CESM#139. To satisfy the needs in that discussion, for this:

What if a component is in-sync but has modified its sub-externals cfg file? Do we read from the modified sub-externals cfg file or abort in this case?

I believe we would want to read from the modified sub-externals cfg file (NOT abort).

@mnlevy1981
Copy link
Contributor

A comment I made in ESCOMP/CESM#139 (comment) might be better-suited for this issue ticket:

Basically, I'd like a .manage_externals_ignore file that signals to manage_externals that it's okay if a specific file has been modified, but I think it maybe only works in the very narrow scope where the current checkout / detached head matches where manage_externals wants it to be?

So if Externals_POP.cfg is listed in POP's .manage_externals_ignore, if you rerun checkout_externals at the CESM level and POP is already at the right SHA and also Externals_POP.cfg is the only modified file in the checkout, then POP's externals would be updated to whatever is in the modified version of the file (provided the externals themselves haven't been modified in such a way to trigger an abort)

@billsacks
Copy link
Member Author

@mnlevy1981 note that, in this issue, I'm suggesting something broader that I think also encompasses your suggestion: I'm suggesting that we should consider allowing any files to be modified in an external, as long as that external is already at the right SHA. i.e., manage_externals would only abort if there is a given external that both (a) is at the wrong SHA and (b) has modified files. I'm not tied to that idea, though, and if others feel that is too permissive then I'd be happy with a fallback like the one you suggest. But I'm curious what you think of that broader, more permissive idea.

@ekluzek
Copy link

ekluzek commented Mar 19, 2020

We had a discussion today with about six CSEG folks. We talked about @mnlevy1981 idea of an ignore file and we like it. It seems to be the best way to implement this, without being too permissive. We thought it shouldn't be an "invisible" file though. And I think it probably should only be invoked with an option where you give the file so something like...

./manage_externals/checkout_externals --ignore manage_externals_ignore

The contents of the file would just be a list of files that can be modified without manage_externals failing.

Having an explicit file like this means you have thought through any ramifications of having that file allowed to be modified (or at least you can only blame yourself). This means the two special cases of consideration in the opening text are handled by the user and we don't have to decide on particular behavior for the tool here.

@billsacks
Copy link
Member Author

Sorry I wasn't able to make it to the meeting yesterday. I'm curious: were there specific concerns with always letting manage_externals run if an in-sync component has been modified? I'm having trouble seeing situations where that would cause problems, and as mentioned in my initial comment, there seem to be legitimate times when this extra permissiveness would lead to less frustrations for users. So I'm just wondering if there are downsides to this that others see.

@gold2718
Copy link
Collaborator

@ekluzek,

I am curious about your proposal to require an explicit --ignore flag in order to have checkout_externals know to not worry about the status of some files. What if the files to ignore are in a sub-external or a sub-sub-external? Are you supposed to collect and keep track of all such files at all depths of your project?

@mnlevy1981
Copy link
Contributor

@billsacks

@mnlevy1981 note that, in this issue, I'm suggesting something broader that I think also encompasses your suggestion: I'm suggesting that we should consider allowing any files to be modified in an external, as long as that external is already at the right SHA. i.e., manage_externals would only abort if there is a given external that both (a) is at the wrong SHA and (b) has modified files. I'm not tied to that idea, though, and if others feel that is too permissive then I'd be happy with a fallback like the one you suggest. But I'm curious what you think of that broader, more permissive idea.

My only concern is that allowing something that used to cause an error is a big change to the design of the software, and so I think any measure that can be taken to minimize that should be. I wasn't involved in the original design meeting that resulted in the current behavior, so I think the best summary of my opinion is "I don't know why it throws an error if it's in a dirty state but at the right SHA, but I wanted to suggest a fix that maintains that behavior as much as possible."

As for the behavior itself, I don't have any issue with the idea of only throwing an error if the repo is dirty AND it's at the wrong SHA. It seems like you, @gold2718, and perhaps @jedwards4b are the three people running this repo so if you all think the previous design was a poor choice I'll support your decision to change it.

@ekluzek
Copy link

ekluzek commented Mar 20, 2020

I think in the production environment, I'd want manage_externals to abort if it found any of the externals were "dirty". In that case it would only be dirty if something happened on accident -- so I'd want it to abort and CIME to fail so that I know about it.

In the development environment I still might want to be warned about an external being dirty -- because it was changed accidentally. Which is part of why I think it should be an option rather than regular behavior.

@billsacks
Copy link
Member Author

I want to explain my thoughts a little more, but I also want to be clear: I don't feel strongly about this so I'm not going to push too hard for this if others feel it's a bad idea.

@mnlevy1981 thanks for the explanation. I appreciate your concern for maintaining old behavior. If I remember correctly, Ben Andre (who did much of the initial coding for manage_externals) felt strongly that the result of your sandbox after running manage_externals should always be predictable, and the end state should never depend on the history of what you have done previously in that sandbox (unless manage_externals aborts with an error). I see that point to some respect and agreed that it made sense, until I started seeing that it can cause significant inconvenience in practice... so I have personally started to feel that a slightly less rigid approach is justified in certain cases. (There is more discussion of some related philosophical / high-level design issues in #51 . In particular, see option (5) in #51 (comment) , which considered the issue under discussion here; at the time we came down against this, but that seemed to be based more on gut feelings rather than any particular well-thought-through considerations.)

@ekluzek you make good points about the importance of being conservative in a production environment. However, I don't feel that manage_externals is the right place to do these checks. Instead, if we feel we should have checks for a clean sandbox, these checks should be done by cime sometime in the case creation, build or submission process (though it can use manage_externals to help determine the sandbox's full status). If we rely on manage_externals to do this, then it doesn't prevent someone from (1) checking out all externals, followed by (2) modifying some files, then (3) creating a case (without running manage_externals again between (2) and (3)). I think this would be a much more common source of issues. IMHO, having this check in manage_externals can give a false sense of security while still causing annoyances to users.

@ekluzek
Copy link

ekluzek commented Mar 20, 2020

@billsacks note that manage_externals is ALREADY being run every time you run a case to do provenance documentation. And as you know CTSM testing runs it before running a test suite. Now it just documents it with the "--status" option, so you can have dirty externals. But, it does run it.

If you don't think it should be there, we'd need to remove it from CIME, and then come up with some different type of checker. Personally I think the way it works now makes sense with CIME relying on manage_externals to document a case.

@billsacks
Copy link
Member Author

@ekluzek sorry if I was unclear. I was supporting the use of manage_externals in this --status mode, providing information about the status of the sandbox that other tools can then use as they see fit.

What I was arguing against was relying on manage_externals in non-status mode to be the one to provide these checks. If your desire is, "Production cases should not be done from sandboxes with uncommitted changes", then I don't feel it's appropriate to have this check be done when running manage_externals without the status option: This won't catch many of problems you want to catch, but will catch many "problems" that you're not intending to catch. What COULD be appropriate is having cime run manage_externals --status (or possibly using manage_external's python interface directly) and checking to ensure that all externals are up-to-date and have no uncommitted changes.

To summarize my feelings in a different way: I feel we should simultaneously:

  1. Make manage_externals a bit more permissive to accommodate the many legitimate complaints of people trying to do active development on externals or sub-externals.

  2. Add some additional checking in cime to prevent running production cases from a dirty sandbox; this can make use of manage_externals in status mode. (This will need some thought, though, to decide whether the default should be development or production mode.)

It seems to me like these two things together achieve the protections people want while also making manage_externals more usable... but maybe I'm missing something.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants