Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Azure: Support vended credentials refresh in ADLSFileIO. #11577

Open
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

ChaladiMohanVamsi
Copy link

Proposed Change

Add support to refresh and consume vended storage credentials for ADLSFileIO.

New Azure properties

Requirements of credential config

  • Required to have Sas token expiration time as part of credential config along with Sas token.
  • Expected prefix for the expiration time is adls.sas-token-expire-at-ms. similar to existing sas token prefix adls.sas-token.
  • Expected to have only one ADLS credential per storage-account prefix.

Similar PRs for other FileIOs

@ChaladiMohanVamsi ChaladiMohanVamsi marked this pull request as draft November 18, 2024 11:26
@ChaladiMohanVamsi ChaladiMohanVamsi marked this pull request as ready for review November 18, 2024 11:47
@ChaladiMohanVamsi
Copy link
Author

cc// @nastra @jackye1995 @amogh-jahagirdar @munendrasn Can you please help with the review.

@amogh-jahagirdar amogh-jahagirdar self-requested a review November 19, 2024 13:24
private LoadCredentialsResponse fetchCredentials() {
Map<String, String> headers =
RESTUtil.merge(
configHeaders(properties),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you elaborate why this is needed here?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tried to implement the similar behaviour present in RESTSessionCatalog, where catalog can be configured to pass explicit headers to server by setting the configuration with header. prefix.

private static final long MIN_REFRESH_WAIT_MILLIS = 10;

public AzureSasCredentialRefresher(
Supplier<Pair<String, Long>> sasTokenWithExpirationSupplier,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this actually need to be a supplier given that it's being immediately used in L40?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the supplier is being used during initialization and also during each scheduled refresh to fetch the new credentials. Supplier holds the logic to fetch new credentials from the API endpoint, since we are going to use it multiple times I modelled it as supplier instead of single method call. Please suggest if there is a cleaner way to achieve the same.

.isEqualTo(credential.config().get(ADLS_SAS_TOKEN_PREFIX + STORAGE_ACCOUNT));

Thread.sleep(20);
// Since expiration time past to current time, the refresh will fall back at minimum 10ms
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what this comment is trying to say. Also what happens if you remove the sleep time above?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Since the credential refresh implementation is based on scheduling a new refresh using the server provided expiration time,
  • this test is covering the scenario where server responds with already expired token (expires-at-ms < current-time).
  • The scheduling delay for consequent refresh is minimum delay of 10ms as per the logic in refreshDelayMillis()
  • If we remove the sleep time, the credentials will be fetched from API only once. Depending on the test case execution time (> 10ms) there can be second credential fetch. To keep the test case behaviour consistent I used thread sleep.

@nastra
Copy link
Contributor

nastra commented Dec 3, 2024

@ChaladiMohanVamsi thanks for working on this. Do you have a way of actually testing this PR with an ADLS environment and see whether the refreshes work?

@ChaladiMohanVamsi
Copy link
Author

@nastra @amogh-jahagirdar Thanks for the review and suggestions. I have addressed the review comments.

I was able to test the credentials refresh logic in ADLS environment, since I don't have the new credentials API endpoint spec implementation in my env I used loadTable endpoint for testing by tweaking few parts of this PR.

import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

public class VendedAdlsCredentialProvider implements Serializable, AutoCloseable {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this really need to be Serializable?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

VendedAdlsCredentialProvider is an instance variable in AzureProperties (serializable) which is again an instance variable in ADLSFileIO (serializable). In distributed engines like Spark we serialize ADLSFileIO instance from driver to executor, to ensure each FileIO instance refresh the credentials independently we might need this class to be serializable.

@nastra
Copy link
Contributor

nastra commented Dec 12, 2024

@danielcweeks could you also take a look at this PR please?

@nastra nastra requested a review from danielcweeks December 12, 2024 07:48
@ShyamalaGowri
Copy link

@ChaladiMohanVamsi Can you help to understand how you would initialize ADLSFileIO with these changes. Priorly we can call initialize with the SAS token or account key and initialize the credentials, can u share the sample code as to how you had initialized ADLSFileIO and tried testing with these changes ?

@ChaladiMohanVamsi
Copy link
Author

@ChaladiMohanVamsi Can you help to understand how you would initialize ADLSFileIO with these changes. Priorly we can call initialize with the SAS token or account key and initialize the credentials, can u share the sample code as to how you had initialized ADLSFileIO and tried testing with these changes ?

@ShyamalaGowri There is no change in the way we initialize the FileIO. The initialization remains same (ref). This change is backward compatible, we can also initialize with the SAS token (existing way). This change introduces a new capability for ADLSFileIO to dynamically fetch sas token from the credential endpoint provided in the config during initialization.

private LoadCredentialsResponse fetchCredentials() {
Map<String, String> headers =
RESTUtil.merge(
configHeaders(properties),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you elaborate why we need the config headers here? I believe the auth headers should be enough

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tried to implement the similar behaviour present in RESTSessionCatalog for loadTable(), where catalog can be configured to pass explicit headers to server by setting the configuration with header. prefix.

While I was testing these changes with vended credentials from loadTable response (workaround to mimic new credential endpoint), I saw the possibility that REST server implementations may use headers as feature rollout gates/ additional metadata to decide credential expiration time. Similarly new credential endpoint may also support accepting headers from client as additional metadata.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think for now let's leave the config headers out, because we also don't have them for vended S3/GCS credential providers

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, will remove them.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed the config headers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants