Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

testing: Weaviate impl #88

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

emekaokoli19
Copy link

@emekaokoli19 emekaokoli19 commented May 2, 2024

/claim #74

@dhruv-anand-aintech Please take a look


🚀 This description was created by Ellipsis for commit 5e2a592

Summary:

Enhances Weaviate export and import functionalities with support for multiple connection types, batch processing, and integration of OpenAI API key, with import functionality needing completion.

Key points:

  • Added support for local and cloud connections in ExportWeaviate and ImportWeaviate.
  • Integrated OpenAI API key handling in Weaviate connections.
  • Implemented batch processing for exporting data to Parquet files in ExportWeaviate.
  • Prepared structure for importing data from Parquet files in ImportWeaviate, but actual data upsert code is commented out.
  • Environment variables are used for connection details, and progress is visualized using tqdm.

Generated with ❤️ by ellipsis.dev

@algora-pbc algora-pbc bot mentioned this pull request May 2, 2024
13 tasks
@emekaokoli19 emekaokoli19 marked this pull request as draft May 2, 2024 08:26
Copy link

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❌ Changes requested.

  • Reviewed the entire pull request up to 5e2a592
  • Looked at 296 lines of code in 2 files
  • Took 1 minute and 19 seconds to review
More info
  • Skipped 0 files when reviewing.
  • Skipped posting 0 additional comments because they didn't meet confidence threshold of 50%.

Workflow ID: wflow_Zv66JEmFQhuhtLEi


Want Ellipsis to fix these issues? Tag @ellipsis-dev in a comment. You can customize Ellipsis with review rules, user-specific overrides, quiet mode, and more. See docs.

21 hours left in your free trial, upgrade for $20/seat/month or contact us.

@@ -23,6 +27,15 @@ def make_parser(cls, subparsers):

parser_weaviate.add_argument("--url", type=str, help="URL of Weaviate instance")
parser_weaviate.add_argument("--api_key", type=str, help="Weaviate API key")
parser_weaviate.add_argument("--openai_api_key", type=str, help="Openai API key")
parser_weaviate.add_arguments(
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The method add_arguments is incorrect and will cause a runtime error as it does not exist in argparse. It should be add_argument.

Suggested change
parser_weaviate.add_arguments(
parser_weaviate.add_argument(

weaviate_import.upsert_data()
return weaviate_import

def __init__(self, args):
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The connection_type argument is used here but it is not defined in the argument parser for the import script. This will cause an error when trying to access self.args["connection_type"].

To fix this, add the connection_type argument to the parser in the make_parser method:

Suggested change
def __init__(self, args):
parser_weaviate.add_argument(
"--connection-type", type=str, choices=["local", "cloud"], default="cloud",
help="Type of connection to Weaviate (local or cloud)"
)

@emekaokoli19 emekaokoli19 marked this pull request as ready for review May 6, 2024 04:56
Copy link

ellipsis-dev bot commented May 6, 2024

Your free trial has expired. To continue using Ellipsis, sign up at https://app.ellipsis.dev for $20/seat/month. If you have any questions, reach us at [email protected]

@dhruv-anand-aintech
Copy link
Member

@emekaokoli19 did you test the newest version of the code with a weaviate instance?

@emekaokoli19
Copy link
Author

@emekaokoli19 did you test the newest version of the code with a weaviate instance?

@dhruv-anand-aintech I am sorry for the late reply, I was out for a few days but I am back now. Yes the code was tested with a weaviate instance and it works

@dhruv-anand-aintech
Copy link
Member

@emekaokoli19 you need to make sure that it works for any import command.
eg:

import_vdf \
	--max_num_rows 1000 \
	--hf_dataset aintech/vdf_prefix-cache \
    weaviate

Here it asks me for index name, which it should be using the name from the VDF_META.json file (see other DBs' implementation).
Screenshot 2024-05-15 at 5 00 42 PM

It also fails because openai_api_key is not set. This is not a required API key for importing datasets. You need to make sure the code works without it being set (using .get("openai_api_key", "") syntax.

Screenshot 2024-05-15 at 5 00 51 PM

Let's have a pair programming session to finish off this task. Please message me on discord to set a time

@dhruv-anand-aintech dhruv-anand-aintech changed the title testing testing: Weaviate impl May 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants