Add s3fs to requirements #56

tuliocasagrande · 2020-09-02T14:10:39Z

Hey guys,

In case you want to read a dataframe directly from s3, for example:

df = pd.read_csv('s3://my_bucket/my_object')

You will need an optional dependency called s3fs:

ImportError: Missing optional dependency 's3fs'. The s3fs package is required to handle s3 files. Use pip or conda to install s3fs.

The text was updated successfully, but these errors were encountered:

edwardjkim · 2020-09-03T04:50:15Z

Hello,s3fs is an optional dependency and it's not installed by default in the scikit-learn container because the recommended way of using the scikit-learn container with Python SDK is to use the Estimator.fit() method. Please see this related issue: aws/sagemaker-python-sdk#1496. Will the approach described in the issue work in your case?

tuliocasagrande · 2020-09-05T17:22:46Z

Hello @edwardjkim, thank you for the quick answer.

My workload doesn't have an estimator.
I'm using the scikit-learn container with SageMaker Processing and I wanted to read an auxiliary dataset directly from s3. Since this dataset is chosen during runtime, I cannot send it to the container in advance through the ProcessingInputs argument.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add s3fs to requirements #56

Add s3fs to requirements #56

tuliocasagrande commented Sep 2, 2020

edwardjkim commented Sep 3, 2020

tuliocasagrande commented Sep 5, 2020

Add s3fs to requirements #56

Add s3fs to requirements #56

Comments

tuliocasagrande commented Sep 2, 2020

edwardjkim commented Sep 3, 2020

tuliocasagrande commented Sep 5, 2020