Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data-driven output file names #93

Open
GoogleCodeExporter opened this issue Feb 14, 2016 · 0 comments
Open

Data-driven output file names #93

GoogleCodeExporter opened this issue Feb 14, 2016 · 0 comments

Comments

@GoogleCodeExporter
Copy link

Some jobs require multiple files to be created where the file name is derived 
from the data.  Something like this

... -> transform [$.filepart, $.data]
    -> write( dataDrivenFd(baseFd('/foo/*.dat')) )

where baseFd produces a FileOutputFormat, and dataDrivenFd will replace the '*' 
with $[0] and write the data from $[1] of each element.  There are some tricky 
issues with the number of open files, the same file being written on different 
nodes (map/reduce output would require ./part-##### files), the OutputCommitter 
would need to be special etc.  We should treat the case of partitioned and 
grouped fileparts specially so we open only one file at a time and don't 
require the part-##### files. 


Original issue reported on code.google.com by [email protected] on 7 Jul 2010 at 12:51

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant