Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat: ai from s3 #1215

Open
wants to merge 8 commits into
base: master
Choose a base branch
from
Open

Feat: ai from s3 #1215

wants to merge 8 commits into from

Conversation

rauerhans
Copy link
Collaborator

Summary

After the dagster ETL for the plural ai vector store index is validated to work in prod, this should be merged to get the data from the pipeline sink at s3. I left the old scraper as is for now, we can certainly remove it when everything works as intended.

The storage context is now pulled from s3 so the main.py script needs to know where to find it and how to authenticate.

  • Auth:
    IRSA should work, otherwise you'll need to set the standard AWS env vars:
    • AWS_ACCESS_KEY_ID
    • AWS_SECRET_ACCESS_KEY
  • Path:
    The script expects the AWS path in PLURAL_AI_INDEX_S3_PATH in the format <bucket-name>/<path>.
    Defaults to plural-assets/dagster/plural-ai/vector_store_index

To be safe AWS_DEFAULT_REGION should be set to the region of the bucket.

Labels

Test Plan

Checklist

  • If required, I have updated the Plural documentation accordingly.
  • I have added tests to cover my changes.
  • I have added a meaningful title and summary to convey the impact of this PR to a user.
  • I have added relevant labels to this PR to help with categorization for release notes.

ai/main.py Outdated Show resolved Hide resolved
@stoat-app
Copy link

stoat-app bot commented Aug 25, 2023

Easy and customizable dashboards for your build system. Learn more about Stoat ↗︎

Static Hosting

Name Link Commit Status
api-coverage Visit e7bd4f2
rtc-coverage Visit e7bd4f2
core-coverage Visit e7bd4f2
cron-coverage Visit e7bd4f2
email-coverage Visit e7bd4f2
worker-coverage Visit e7bd4f2
api-test-results Visit e7bd4f2
graphql-coverage Visit e7bd4f2
rtc-test-results Visit e7bd4f2
core-test-results Visit e7bd4f2
cron-test-results Visit e7bd4f2
email-test-results Visit e7bd4f2
worker-test-results Visit e7bd4f2
graphql-test-results Visit e7bd4f2

Job Runtime

job runtime chart

debug

@rauerhans rauerhans added the enhancement New feature or request label Aug 25, 2023
@swoodward90 swoodward90 added the roadmap On the engineering roadmap for the quarter. label Sep 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request roadmap On the engineering roadmap for the quarter.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants