Skip to content
This repository has been archived by the owner on Mar 27, 2022. It is now read-only.

HowTo configure defaultFS for hadoop on singlenode/yarn setup #66

Open
blogmaniak opened this issue Sep 30, 2015 · 1 comment
Open

HowTo configure defaultFS for hadoop on singlenode/yarn setup #66

blogmaniak opened this issue Sep 30, 2015 · 1 comment

Comments

@blogmaniak
Copy link

Hi - I have built a gce structure using ./bdutil deploy --bucket anintelclustergen1-m-disk -n 2 -P anintelcluster -e extensions/spark/spark_on_yarn_env.sh.

In the bucket paraments, both in command and bdutil_evn.sh, I have specified a non-boot bucket.
In the core-site.xml (under hadoop/etc) on the master, it show the xml with the correct bucket value under defaultFS.
However, the hadoop console (50070) does not show the nonboot bucket attached, but shows the boot disk attached on the name node.

Node Last contact Admin State Capacity Used Non DFS Used Remaining Blocks Block pool used Failed Volumes Version
anintelcluster.c.anintelcluster.internal:50010 (10.240.0.2:50010) 0 In Service 98.4 GB 28 KB 6.49 GB 91.91 GB 0 28 KB (0%) 0 2.7.1

Is it possible to specify a non-boot bucket with the singlenode setup?
If not, what needs to be done to be able to specify the non-boot disk, which will both get attached to instance as read/write and also be used by hadoop for storage etc?

@dennishuo
Copy link
Contributor

So, the GCS connector actually isn't able to be mounted as a local filesystem, it simply plugs into Hadoop at Hadoop's FileSystem.java layer. This means it gets used as the FileSystem for Hadoop-specific jobs, but doesn't change the way the local filesystem uses a real disk as a block device.

The GCS connector also lives independently alongside Hadoop's HDFS. So, when you're looking at 50070, you're seeing the actual HDFS setup which writes blocks out to the local disk and not to GCS, which would be accessible as a "hdfs:///" path for Hadoop jobs. In general, if you've configured defaultFS to use a GCS path, you can just ignore whatever the NameNode on 50070 is reporting, since in that case your typical Hadoop jobs simply won't interact with the HDFS setup at all.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants