You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Just reporting some statistics here - there is a lot of duplication of code with the current method for storing MadGraph files for later running. MG5 is bad but MG4 is even worse with some files being copied over 40 times. The tar-ball below has full listings of all the files and their md5sum as well as a sorted list of uniq md5sum with a corresponding example file. This does not handle duplicate code within files but it is a start.
@tomeichlersmith After looking at this a bit, what is your conclusion about how feasible this is for us to change how we are already doing things?
From what I remember, hps-mc copies some portion of the entire source tree into the run directory with the MG components. My main concern would be that we are possibly copying in a lot of extra files that we don't need, but I don't know whether this is the case or not.
What is the difference between how MG4 and MG5 handle all this? Is MG5 better in some way with less file duplication?
MG5 is indeed better than MG4 in terms of avoiding file copying - that was one of the major updates that led to the major version increase.
possibly coyping in a lot of extra files that we don't need
We are almost certainly doing that. One of the issues is that some MG source files are used in one model and not used in another, so we'd need to check all of the different models we wish to support when attempting to delete any files.
feasibility
Its definitely feasible. There are several avenues of improvement but the big issue is time. Does anyone have time to do these things? Probably not...
One thing I've done in the past is run the program and then check which files were accessed/read. This at least eliminates files that aren't even opened by MG/ME during running.
Another avenue of improvement would be to abandon MG4 in favor of MG5. MG5 has better support for what I call "MadEvent workspaces" i.e. once you define a process you want to study you can dump that process into a "MadEvent workspace" which can be run on its own. We would then only need to store the set of these ME workspaces which would isolate all the models into their own subdirectories. (this is already what idm and simp do).
Just reporting some statistics here - there is a lot of duplication of code with the current method for storing MadGraph files for later running. MG5 is bad but MG4 is even worse with some files being copied over 40 times. The tar-ball below has full listings of all the files and their md5sum as well as a sorted list of uniq md5sum with a corresponding example file. This does not handle duplicate code within files but it is a start.
mg-unique-file-listing.tar.gz
How
calculate md5sum of each file1
get uniq files sorted by number of copies
Footnotes
using
fd
instead offind
here since its faster. Thefind
equivalent isfind -type f -exec md5sum {} ';' | sort > md5sum.list
↩The text was updated successfully, but these errors were encountered: