Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow to pass required files to a CU via input_data CUD attribute. #174

Open
pradeepmantha opened this issue Feb 9, 2014 · 4 comments
Open

Comments

@pradeepmantha
Copy link
Contributor

Having something like will be great. Currently we need to segregate DUs with all the files required for a CU. But this could be optimized and avoid unnecessary DU creation.

    """ Parsing input data field of job description:
        {
        ...
         "input_data": [
                        {
                         input_data_unit.get_url(): 
                         ["file1","file2"]
                        }
                        ]

        or

        "input_data": [
                        input_data_unit.get_url()                                                         
                     ]                        
        }    
    """   
@marksantcroos
Copy link
Member

Hi Pradeep,

Can you elaborate, I don’t really understand what you mean.

Thanks

Gr,

Mark

On 09 Feb 2014, at 20:19 , pradeepmantha [email protected] wrote:

Having something like will be great. Currently we need to segregate DUs with all the files required for a CU. But this could be optimized and avoid unnecessary DU creation.

""" Parsing input data field of job description:
    {
    ...
     "input_data": [
                    {
                     input_data_unit.get_url(): 
                     ["file1","file2"]
                    }
                    ]

    or

    "input_data": [
                    input_data_unit.get_url()                                                         
                 ]                        
    }    
"""   


Reply to this email directly or view it on GitHub.

@pradeepmantha
Copy link
Contributor Author

Consider below example - I have a task which created 1000 files, where for each file, I wanna create a task, which takes the file itself as input. With current 'input_data' CUD attribute, I can only pass, all the contents of DU. So, I either need to create 1000 intermediate DUS, one for each file, and pass the DU as input to the task, or pass the 1000 files for each CU without creating intermediate DUS. Allowing to Specify required input files as below, will help to avoid intermediate creation of DUS and just get the required files from the DUS.

"input_data": [
{
input_data_unit.get_url():
["file1","file2"]
}

Again this could be a flexibility that user/application can use.

@marksantcroos
Copy link
Member

Hi Pradeep,

On 19 Feb 2014, at 19:26 , pradeepmantha [email protected] wrote:

Consider below example - I have a task which created 1000 files, where for each file, I wanna create a task, which takes the file itself as input.

Ok, clear.

With current 'input_data' CUD attribute, I can only pass, all the contents of DU.

Correct, DU’s are atomic units for good reasons.

So, I either need to create 1000 intermediate DUS, one for each file, and pass the DU as input to the task,

Agreed, whats the problem with that? Isn’t that exactly what you want in this situation?

or pass the 1000 files for each CU without creating intermediate DUS.

Thats obviously not what you want.

Allowing to Specify required input files as below, will help to avoid intermediate creation of DUS and just get the required files from the DUS.

"input_data": [
{
input_data_unit.get_url():
["file1","file2"]
}

What does the “just get the required files” actually mean here? What are the exact semantics of that?

In general, I believe I see what you want to do, but as far as I can tell this can be expressed perfectly with the current model, without breaking the actual semantics.

More over, for this specific pattern, it makes sense to add a layer on top of PD, which is exactly what do you did, right?

Gr,

Mark

@pradeepmantha
Copy link
Contributor Author

Hi,

On Wed, Feb 19, 2014 at 11:36 AM, Mark Santcroos
[email protected]:

Hi Pradeep,

On 19 Feb 2014, at 19:26 , pradeepmantha [email protected]
wrote:

Consider below example - I have a task which created 1000 files, where
for each file, I wanna create a task, which takes the file itself as input.

Ok, clear.

With current 'input_data' CUD attribute, I can only pass, all the
contents of DU.

Correct, DU's are atomic units for good reasons.

So, I either need to create 1000 intermediate DUS, one for each file,
and pass the DU as input to the task,

Agreed, whats the problem with that? Isn't that exactly what you want in
this situation?

- It works, but need to create intermediate 1000 DUs,  Just want to

avoid that for performance reasons.

or pass the 1000 files for each CU without creating intermediate DUS.

Thats obviously not what you want.

Allowing to Specify required input files as below, will help to avoid
intermediate creation of DUS and just get the required files from the DUS.

"input_data": [
{
input_data_unit.get_url():
["file1","file2"]
}

What does the "just get the required files" actually mean here? What are
the exact semantics of that?

  • Its the same semantics analogous to how the "output_data" CUD
    attribute currently behaves.

In general, I believe I see what you want to do, but as far as I can tell
this can be expressed perfectly with the current model, without breaking
the actual semantics.

  • Yes, its just implementation.. I actually implemented in Pradeep
    branch of BigJob.

More over, for this specific pattern, it makes sense to add a layer on top
of PD, which is exactly what do you did, right?

- Yes.

Gr,

Mark

Reply to this email directly or view it on GitHubhttps://github.com//issues/174#issuecomment-35538681
.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants