Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use the same shared FileSystem instance across calls in HadoopTableOperations #4

Closed
wants to merge 1 commit into from

Conversation

mccheah
Copy link
Contributor

@mccheah mccheah commented Nov 26, 2018

@danielcweeks
Copy link
Contributor

@mccheah I'm not clear on the value of this change. The file system instance will actually get reused because it's cached at the FileSystem level (https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/FileSystem.java#L156).

I really don't have a strong opinion for either implementation. @rdblue thoughts?

@mccheah
Copy link
Contributor Author

mccheah commented Nov 26, 2018

The rationale is explained in the issue that was in the previous repository. See Netflix/iceberg#92 and Netflix/iceberg#91.

@mccheah
Copy link
Contributor Author

mccheah commented Nov 26, 2018

I'm also very much averse to JVM-wide caching, as FileSystem.get does.

@rdblue
Copy link
Contributor

rdblue commented Nov 26, 2018

Sorry, I reviewed the copy of this in the Netflix repo: Netflix/iceberg#108

Overall, the change is okay. I think it is fine to expose the factory method as long as it is package-private and only used by HadoopTableOperations. But that really limits the utility of it, while going against best practices for using the Hadoop FS API.

I think the best option for the use case is to subclass HadoopTables & HadoopTableOperations and replace the current behavior if you need to.

@mccheah
Copy link
Contributor Author

mccheah commented Nov 27, 2018

A compromise could be to have HadoopTableOperations have an overridable method like so:

FileSystem getFileSystem(Path path, Configuration conf);

The default implementation can just call Util.getFs, but overriding this can run their own caching layer.

@mccheah
Copy link
Contributor Author

mccheah commented Nov 27, 2018

Suggested the above in #13. If we prefer that, we can close this.

@mccheah
Copy link
Contributor Author

mccheah commented Nov 27, 2018

Closing in favor of #15 .

@mccheah mccheah closed this Nov 27, 2018
yifeih pushed a commit to yifeih/incubator-iceberg that referenced this pull request Apr 16, 2019
Need to start somewhere, so will merge this - am expecting a test failure due to a bug in Parquet at the moment which will be fixed in palantir/parquet-mr#39.
rdsr added a commit to rdsr/incubator-iceberg that referenced this pull request Oct 26, 2019
prodeezy referenced this pull request in rominparekh/incubator-iceberg Dec 17, 2019
# This is the 1st commit message:

Issue-629: Cherrypick Id

# This is the commit message #2:

Removed redundant methods and changed method name

# This is the commit message #3:

Fix Imports

# This is the commit message #4:

Fix Operation Check

# This is the commit message apache#5:

Fix Error Message

# This is the commit message apache#6:

Cherry picking operation to apply changes from incoming snapshot on current snapshot

# This is the commit message apache#7:

Initial working version of cherry-pick operation which applies appends only
rdsr added a commit to rdsr/incubator-iceberg that referenced this pull request Mar 13, 2020
guilload pushed a commit to guilload/iceberg that referenced this pull request Jul 9, 2020
moulimukherjee referenced this pull request in moulimukherjee/iceberg Jul 24, 2020
Mouli/bump to 0.7 incubating release
HotSushi pushed a commit to HotSushi/iceberg that referenced this pull request Jul 31, 2020
bkahloon pushed a commit to bkahloon/iceberg that referenced this pull request Feb 27, 2021
pavibhai added a commit to pavibhai/iceberg that referenced this pull request Sep 16, 2022
pavibhai added a commit to pavibhai/iceberg that referenced this pull request Oct 31, 2022
adamyasharma2797 pushed a commit to adamyasharma2797/iceberg that referenced this pull request Jul 19, 2024
LAMBERT-102 : Added SPC for IcebergMultiTableFilesCommitter
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants