PilotLogger - How to identify pilots in an early submission state #6075

martynia · 2022-05-13T13:11:25Z

martynia
May 13, 2022

Hi,

In my work on remote pilot logging I identified the following problem:
In the original extended (==remote) logger, which is now being removed, pilots were identified by a UUID. The UUID was created by ExtendedLogger and saved to a file, so it could be accessed easily. I used the UUID to identify pilot files shipped to Tornado. From an old description by the original author this was supposed to be a reliable way to identify pilots. But even then, the UUID generated this way had nothing to do with the pilot reference stored in the PilotDB. In the current setup pilot reference is known only when ConfigureSite command is run, so quite late. This created a problem, when pilot dies early - we have no way to identify it. On the other hand even in an early stage the pilot reference is displayed as a frame title in the WebApp, so there must be a way to get it early. One of the possibility would be moving the code of __setFlavour method from ConfigureSite command to SiteDirector and use it, but it looks like a cumbersome solution. Or create a thin command which is run first and just gets required info from the CE ?

Thanks, JM

fstagni · 2022-05-13T14:47:48Z

fstagni
May 13, 2022
Maintainer

(going through some old notes...)

The UUID is needed because we are not only submitting to HTCondor or ARC CEs.

For what regards DIRAC CEs the UUID needs to be created. Same for "pilot scripts" started by "pilot wrappers" started "in the vacuum" (i.e. VMs, but not only):

there's always some reference for the VM instance which the VM can discover. It is created by OpenStack, or whatever.
for stuff like BOINC or for other opportunistic resources we would need to create it somehow.

So, I think that the UUID should be set as early as possible, so in the PilotWrapper or in the VM contextualization. For HTCondor and ARC CEs we could use as UUID the pilotReference (which is unique and should be present) but not for the other computing resources.

0 replies

atsareg · 2022-10-10T15:42:02Z

atsareg
Oct 10, 2022
Maintainer

After discussion with Janusz, I suggest to use PilotStamps as unique pilot identifiers that can be used in creating pilot logging artefacts, e.g. file names, etc. The PilotStamps are already stored in the PilotAgentsDB and have a one-to-one correspondence to the Pilot references. So, these are good Pilot UUIDs that can be indexed for better searches.

The PilotStamps are generated before the pilot submission by almost all types of CEs (not all of them yet now). How they can be passed to the running pilots at the very beginning of its execution ? The most straightforward mechanism to make the PilotStamp available in the pilot execution environment is to pass the PilotStamp as the environment variable, e.g. DIRAC_PILOT_STAMP, to the running pilot. This looks to be possible to do by defining an array of PilotStamps in the HTCondor submission file, ARC XRSL file or as an argument to the remote command in the SSHComputingElement or adding it to the CloudComputingElement user data template, e.g. passing to the dirac-pilot.py wrapper as its --userEnvVariables argument. In the pilot the PilotStamp will be then available from the very beginning.

I do not know how this is done exactly in the no-SiteDorector case, e.g. in the Vac/VCycle or Online farm. But in this case the PilotStamp can be generated right in the pilot for logging, I think, which then can be reported back to DIRAC in the first pilot-Matcher communication (I guess this is how the pilot reference itself is communicated in this case as well).

If we agree in principle, then the code of Computing Elements will have to be modified but it is rather trivial for our main CEs.

0 replies

fstagni · 2022-10-12T10:41:17Z

fstagni
Oct 12, 2022
Maintainer

Maybe I am mixing things, but in the Pilot code (https://github.com/DIRACGrid/Pilot/blob/master/Pilot/pilotCommands.py#L715) there is a pilotReference) created.

And these line https://github.com/DIRACGrid/Pilot/blob/master/Pilot/pilotCommands.py#L818-L819 are assuming to possibly find a PILOT_UUID env variable (is this "same" as DIRAC_PILOT_STAMP?).
The matcher will then look for PilotReference in the job description.

For "vacuum" (no-SiteDirector) case we either impose to use a --userEnvVariables switch or we generate from the pilot code (preferred option).

1 reply

martynia Oct 12, 2022
Author

DIRAC_PILOT_STAMP is a new env variable which is being inherited by the pilot wrapper, and then used en lieu of a pilot UUID, as defined in #6208. An example of a change in the wrapper is here:(https://github.com/martynia/DIRAC/blob/7f703b6fd2b65a067fe2fd209740072990e36b28/src/DIRAC/WorkloadManagementSystem/Utilities/PilotWrapper.py#L336)

J.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PilotLogger - How to identify pilots in an early submission state #6075

{{title}}

Replies: 3 comments 1 reply

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

PilotLogger - How to identify pilots in an early submission state #6075

martynia May 13, 2022

Replies: 3 comments · 1 reply

fstagni May 13, 2022 Maintainer

atsareg Oct 10, 2022 Maintainer

fstagni Oct 12, 2022 Maintainer

martynia Oct 12, 2022 Author

martynia
May 13, 2022

Replies: 3 comments 1 reply

fstagni
May 13, 2022
Maintainer

atsareg
Oct 10, 2022
Maintainer

fstagni
Oct 12, 2022
Maintainer

martynia Oct 12, 2022
Author