-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use the same shared FileSystem instance across calls in HadoopTableOperations #4
Conversation
@mccheah I'm not clear on the value of this change. The file system instance will actually get reused because it's cached at the FileSystem level (https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/FileSystem.java#L156). I really don't have a strong opinion for either implementation. @rdblue thoughts? |
The rationale is explained in the issue that was in the previous repository. See Netflix/iceberg#92 and Netflix/iceberg#91. |
I'm also very much averse to JVM-wide caching, as |
Sorry, I reviewed the copy of this in the Netflix repo: Netflix/iceberg#108 Overall, the change is okay. I think it is fine to expose the factory method as long as it is package-private and only used by HadoopTableOperations. But that really limits the utility of it, while going against best practices for using the Hadoop FS API. I think the best option for the use case is to subclass HadoopTables & HadoopTableOperations and replace the current behavior if you need to. |
A compromise could be to have
The default implementation can just call |
Suggested the above in #13. If we prefer that, we can close this. |
Closing in favor of #15 . |
Need to start somewhere, so will merge this - am expecting a test failure due to a bug in Parquet at the moment which will be fixed in palantir/parquet-mr#39.
# This is the 1st commit message: Issue-629: Cherrypick Id # This is the commit message #2: Removed redundant methods and changed method name # This is the commit message #3: Fix Imports # This is the commit message #4: Fix Operation Check # This is the commit message apache#5: Fix Error Message # This is the commit message apache#6: Cherry picking operation to apply changes from incoming snapshot on current snapshot # This is the commit message apache#7: Initial working version of cherry-pick operation which applies appends only
…-wip-pr Prepare for upstream WIP PR
Mouli/bump to 0.7 incubating release
…ning the written file for the offsets
LAMBERT-102 : Added SPC for IcebergMultiTableFilesCommitter
Closes Netflix/iceberg#92.