Replies: 3 comments 1 reply
-
(going through some old notes...) The UUID is needed because we are not only submitting to HTCondor or ARC CEs. For what regards DIRAC CEs the UUID needs to be created. Same for "pilot scripts" started by "pilot wrappers" started "in the vacuum" (i.e. VMs, but not only):
So, I think that the UUID should be set as early as possible, so in the PilotWrapper or in the VM contextualization. For HTCondor and ARC CEs we could use as UUID the pilotReference (which is unique and should be present) but not for the other computing resources. |
Beta Was this translation helpful? Give feedback.
-
After discussion with Janusz, I suggest to use PilotStamps as unique pilot identifiers that can be used in creating pilot logging artefacts, e.g. file names, etc. The PilotStamps are already stored in the PilotAgentsDB and have a one-to-one correspondence to the Pilot references. So, these are good Pilot UUIDs that can be indexed for better searches. The PilotStamps are generated before the pilot submission by almost all types of CEs (not all of them yet now). How they can be passed to the running pilots at the very beginning of its execution ? The most straightforward mechanism to make the PilotStamp available in the pilot execution environment is to pass the PilotStamp as the environment variable, e.g. DIRAC_PILOT_STAMP, to the running pilot. This looks to be possible to do by defining an array of PilotStamps in the HTCondor submission file, ARC XRSL file or as an argument to the remote command in the SSHComputingElement or adding it to the CloudComputingElement user data template, e.g. passing to the dirac-pilot.py wrapper as its --userEnvVariables argument. In the pilot the PilotStamp will be then available from the very beginning. I do not know how this is done exactly in the no-SiteDorector case, e.g. in the Vac/VCycle or Online farm. But in this case the PilotStamp can be generated right in the pilot for logging, I think, which then can be reported back to DIRAC in the first pilot-Matcher communication (I guess this is how the pilot reference itself is communicated in this case as well). If we agree in principle, then the code of Computing Elements will have to be modified but it is rather trivial for our main CEs. |
Beta Was this translation helpful? Give feedback.
-
Maybe I am mixing things, but in the Pilot code (https://github.com/DIRACGrid/Pilot/blob/master/Pilot/pilotCommands.py#L715) there is a And these line https://github.com/DIRACGrid/Pilot/blob/master/Pilot/pilotCommands.py#L818-L819 are assuming to possibly find a For "vacuum" (no-SiteDirector) case we either impose to use a |
Beta Was this translation helpful? Give feedback.
-
Hi,
In my work on remote pilot logging I identified the following problem:
In the original extended (==remote) logger, which is now being removed, pilots were identified by a UUID. The UUID was created by ExtendedLogger and saved to a file, so it could be accessed easily. I used the UUID to identify pilot files shipped to Tornado. From an old description by the original author this was supposed to be a reliable way to identify pilots. But even then, the UUID generated this way had nothing to do with the pilot reference stored in the PilotDB. In the current setup pilot reference is known only when ConfigureSite command is run, so quite late. This created a problem, when pilot dies early - we have no way to identify it. On the other hand even in an early stage the pilot reference is displayed as a frame title in the WebApp, so there must be a way to get it early. One of the possibility would be moving the code of
__setFlavour
method from ConfigureSite command to SiteDirector and use it, but it looks like a cumbersome solution. Or create a thin command which is run first and just gets required info from the CE ?Thanks, JM
Beta Was this translation helpful? Give feedback.
All reactions