Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix listing of files in AlluxioFileSystem #23815

Merged
merged 1 commit into from
Jan 1, 2025

Conversation

awkwardd
Copy link
Contributor

@awkwardd awkwardd commented Oct 17, 2024

Description

When use alluxio as hive fs,cant get data from alluxio.
I finaly found that the AlluxioIterator.java didn't generate correct file location as alluxio://192...../dir/filename but alluxio:///dir/filename

Additional context and related issues

Fixes #23778

Release notes

( ) This is not user-visible or is docs only, and no release notes are required.
( ) Release notes are required. Please propose a release note for me.
(x) Release notes are required, with the following suggested text:

## Hive
* Fix reading from tables when using alluxio filesystem. ({issue}`23815`)

Copy link

cla-bot bot commented Oct 17, 2024

Thank you for your pull request and welcome to our community. We could not parse the GitHub identity of the following contributors: tanyajun.
This is most likely caused by a git client misconfiguration; please make sure to:

  1. check if your git client is configured with an email to sign commits git config --list | grep email
  2. If not, set it up using git config --global user.email [email protected]
  3. Make sure that the git commit email is configured in your GitHub account settings, see https://github.com/settings/emails

@raunaqmorarka
Copy link
Member

Could you add a test for this problem ?

@raunaqmorarka raunaqmorarka requested a review from apc999 October 23, 2024 14:38
Copy link
Contributor

@JiamingMai JiamingMai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@awkwardd Add the following extractSchema method to AlluxioUtils, and update the next() method in AlluxioFileIterator can pass the test successfully. Please help update the code in this PR.

    public static Location extractSchema(Location location)
    {
        String path = location.toString();
        if (path.startsWith("alluxio://")) {
            int index = path.indexOf("/", "alluxio://".length());
            if (index != -1) {
                String schema = path.substring(0, index);
                return Location.of(schema + "/");
            }
            else {
                return location;
            }
        }
        return location;
    }
   public FileEntry next()
            throws IOException
    {
        if (!hasNext()) {
            return null;
        }
        URIStatus fileStatus = files.next();
        String path = fileStatus.getPath();
        if (path.startsWith("/")) {
            path = path.substring(1);
        }
        Location schema = extractSchema(dirLocation);
        Location fileLocation = schema.appendSuffix(path);
        return new FileEntry(
                fileLocation,
                fileStatus.getLength(),
                Instant.ofEpochMilli(fileStatus.getLastModificationTimeMs()),
                Optional.empty());
    }

@JiamingMai JiamingMai added the bug Something isn't working label Oct 30, 2024
Copy link

cla-bot bot commented Oct 31, 2024

Thank you for your pull request and welcome to our community. We could not parse the GitHub identity of the following contributors: tanyajun.
This is most likely caused by a git client misconfiguration; please make sure to:

  1. check if your git client is configured with an email to sign commits git config --list | grep email
  2. If not, set it up using git config --global user.email [email protected]
  3. Make sure that the git commit email is configured in your GitHub account settings, see https://github.com/settings/emails

@JiamingMai
Copy link
Contributor

@awkwardd you may also need to sign the CLA https://trino.io/development/process#contribution-process

@awkwardd
Copy link
Contributor Author

awkwardd commented Nov 1, 2024

@awkwardd you may also need to sign the CLA https://trino.io/development/process#contribution-process

I have signed the cla file and snet it last night.

@ebyhr
Copy link
Member

ebyhr commented Nov 3, 2024

@cla-bot check

Copy link

cla-bot bot commented Nov 3, 2024

Thank you for your pull request and welcome to our community. We could not parse the GitHub identity of the following contributors: tanyajun.
This is most likely caused by a git client misconfiguration; please make sure to:

  1. check if your git client is configured with an email to sign commits git config --list | grep email
  2. If not, set it up using git config --global user.email [email protected]
  3. Make sure that the git commit email is configured in your GitHub account settings, see https://github.com/settings/emails

Copy link

cla-bot bot commented Nov 3, 2024

The cla-bot has been summoned, and re-checked this pull request!

@ebyhr
Copy link
Member

ebyhr commented Nov 3, 2024

@awkwardd Please take a look at #23815 (comment)

Copy link
Contributor

@JiamingMai JiamingMai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change looks good to me. @raunaqmorarka Could you help merge this PR?

@raunaqmorarka raunaqmorarka changed the title fix the data pull problem when use alluxio as fs. Fix listing of files in AlluxioFileSystem Nov 19, 2024
@raunaqmorarka
Copy link
Member

This change looks good to me. @raunaqmorarka Could you help merge this PR?

Can we add a test here ? The existing tests didn't catch this right ?

Copy link

This pull request has gone a while without any activity. Tagging for triage help: @mosabua

@raunaqmorarka raunaqmorarka merged commit 5972dc3 into trinodb:master Jan 1, 2025
58 checks passed
@github-actions github-actions bot added this to the 469 milestone Jan 1, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working cla-signed stale
Development

Successfully merging this pull request may close these issues.

4 participants