Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add s3fs to requirements #56

Open
tuliocasagrande opened this issue Sep 2, 2020 · 2 comments
Open

Add s3fs to requirements #56

tuliocasagrande opened this issue Sep 2, 2020 · 2 comments

Comments

@tuliocasagrande
Copy link

Hey guys,

In case you want to read a dataframe directly from s3, for example:

df = pd.read_csv('s3://my_bucket/my_object')

You will need an optional dependency called s3fs:

ImportError: Missing optional dependency 's3fs'. The s3fs package is required to handle s3 files. Use pip or conda to install s3fs.
@edwardjkim
Copy link
Contributor

Hello,s3fs is an optional dependency and it's not installed by default in the scikit-learn container because the recommended way of using the scikit-learn container with Python SDK is to use the Estimator.fit() method. Please see this related issue: aws/sagemaker-python-sdk#1496. Will the approach described in the issue work in your case?

@tuliocasagrande
Copy link
Author

Hello @edwardjkim, thank you for the quick answer.

My workload doesn't have an estimator.
I'm using the scikit-learn container with SageMaker Processing and I wanted to read an auxiliary dataset directly from s3. Since this dataset is chosen during runtime, I cannot send it to the container in advance through the ProcessingInputs argument.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants