Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

for DTR Fails If Output Directory Already Exists #60

Open
GoogleCodeExporter opened this issue Mar 28, 2015 · 1 comment
Open

for DTR Fails If Output Directory Already Exists #60

GoogleCodeExporter opened this issue Mar 28, 2015 · 1 comment

Comments

@GoogleCodeExporter
Copy link

Version: trunk@464

If job has a series of DTRs and the input data to the first step changes then a 
downstream "for" DTR will fail if its output directory is not manually cleared 
before the job is run.

This behavior is (I think) new with revision r358.  It prevents the dataflow 
from working as I think it was intended.

The example hamake job can be used to reproduce this issue.

1. build trunk 
2. cd dist/examples/class-size-median
3. export HADOOP_HOME=<whatever>
4. run the job using the script
    bin/run.sh working
5. add any jar to the data directory
    hadoop fs -put hamake-2.0b-4.jar working/data
6. export RUN_FOLDER=working
7. manually run the job:
    hadoop jar hamake-2.0b-4.jar -f file:///${PWD}/hamakefiles/class-size.xml

The new jar is processed by the first two "foreach" DTRs, but, then the 
histogram "for" DTR fails:

12/10/26 15:59:39 ERROR security.UserGroupInformation: 
PriviledgedActionException as:jlent 
cause:org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory 
hdfs://localhost:8020/user/jlent/working/result/class-size-histogram already 
exists
12/10/26 15:59:39 ERROR task.MapReduce: Failed to execute Hadoop command 
hdfs://localhost:8020/user/jlent/working/hamake-examples-2.0b-4.jar/com.codemind
ers.hamake.examples.ClassSizeHistogram
java.lang.Exception: org.apache.hadoop.mapred.FileAlreadyExistsException: 
Output directory 
hdfs://localhost:8020/user/jlent/working/result/class-size-histogram already 
exists

One additional problem is that if the job is run without first adding a jar to 
the working directory then the inital DTR fails on the line:

Foreach.java

310         LOG.info(getName() + ": Completed " + fetcher.getCounter() + " tasks, " + 
fetcher.getErrors() + " tasks with errors, average run time: " + 
fetcher.getTotalRunTime() / (fetcher.getCounter() + fetcher.getErrors()) + " 
ms");

because the denominator is zero.  This is easy to fix. I just made it:

        if (fetcher.getCounter() + fetcher.getErrors() > 0) {
            LOG.info(getName() + ": Completed " + fetcher.getCounter() + " tasks, " + fetcher.getErrors() + " tasks with errors, average run time: " + fetcher.getTotalRunTime() / (fetcher.getCounter() + fetcher.getErrors()) + " ms");
        }
        else {
            LOG.info("Output of " + getName() + " is already present and fresh");
        }




Original issue reported on code.google.com by [email protected] on 26 Oct 2012 at 8:12

@GoogleCodeExporter
Copy link
Author

In the title and several places in the  description I mistakenly refer to the 
"Fold" DTR as the "For" DTR.  My apologies.

Original comment by [email protected] on 30 Oct 2012 at 2:46

  • Added labels: ****
  • Removed labels: ****

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant