Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python: Create HadoopFileSystem from netloc #8596

Merged
merged 1 commit into from
Sep 22, 2023

Conversation

frankliee
Copy link
Contributor

@frankliee frankliee commented Sep 20, 2023

The current pyarrow module creates FS from hdfs.host and hdfs.port in the catalog configuration.

catalog:
  prod:
    hdfs.host: xxxx
    hdfs.port: xxx

However, when the real hdfs path has a anther netloc, it will throw Wrong FS execption.

java.lang.IllegalArgumentException:
        Wrong FS: hdfs://xxxx expected: hdfs://yyyy

This PR makes two changes:

  1. Create FS from uri when netloc is not empty.
  2. Cache FS by the key of <scheme, netloc>

@frankliee frankliee changed the title Python: Create HadoopFileSystem from netloc (merge request !1060) Python: Create HadoopFileSystem from netloc Sep 20, 2023
Pyiceberg支持跨HDFS Cluster访问,用于访问线上iceberg表

现有的pyarrow模块根据catalog配置来初始化HadoopFileSystem,

```
catalog:
  ice:
    hdfs.host: xxxx
    hdfs.port: xxx
```

如果实际访问的Cluster不是catalog配置的cluster,会触发Wrong FS异常

java.lang.IllegalArgumentException:
        Wrong FS: hdfs://xxxx expected: hdfs://yyyy

改成优先采用制定的uri初始化,

线上测试

TAPD: --story=887319139
Copy link
Contributor

@Fokko Fokko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes sense to me, thanks for fixing this @frankliee

@Fokko Fokko merged commit 4e99c19 into apache:master Sep 22, 2023
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants