Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature]: Import data (e.g. - vectors in parquet files) into milvus standalone with local storageType #36445

Open
1 task done
liorf95 opened this issue Sep 23, 2024 · 7 comments
Assignees
Labels
kind/feature Issues related to feature request from users

Comments

@liorf95
Copy link

liorf95 commented Sep 23, 2024

Is there an existing issue for this?

  • I have searched the existing issues

Is your feature request related to a problem? Please describe.

Currently, it is not possible to import data (e.g. - vectors in parquet files) into Milvus standalone with local storageType and a remote minio instance is required.

Describe the solution you'd like.

Import data (e.g. - vectors in parquet files) into Milvus standalone with local storageType

Describe an alternate solution.

No response

Anything else? (Additional Context)

No response

@liorf95 liorf95 added the kind/feature Issues related to feature request from users label Sep 23, 2024
@xiaofan-luan
Copy link
Collaborator

/assign @bigsheeper
please help on it

@bigsheeper
Copy link
Contributor

Hi @liorf95 ,

To simplify the local import process, we recommend using our bulkwriter tool. It is specifically designed to handle bulk imports efficiently. You can find the detailed instructions for setting up and using bulkwriter in our documentation here: https://milvus.io/api-reference/pymilvus/v2.4.x/DataImport/LocalBulkWriter/LocalBulkWriter.md.

Please let us know if you need any assistance with the setup or have any questions regarding the tool.

@bigsheeper
Copy link
Contributor

/assign @liorf95

@liorf95
Copy link
Author

liorf95 commented Oct 9, 2024

Hi @liorf95 ,

To simplify the local import process, we recommend using our bulkwriter tool. It is specifically designed to handle bulk imports efficiently. You can find the detailed instructions for setting up and using bulkwriter in our documentation here: https://milvus.io/api-reference/pymilvus/v2.4.x/DataImport/LocalBulkWriter/LocalBulkWriter.md.

Please let us know if you need any assistance with the setup or have any questions regarding the tool.

This is exactly what I did- but the LocalBulkWriter does not work as described in bug #35530 (only RemoteBulkWriter works).

@bigsheeper
Copy link
Contributor

bigsheeper commented Oct 10, 2024

Hi @lhotari ,

My apologies for the confusion earlier. I’d like to clarify the usage of the tools:

  1. A LocalBulkWriter instance rewrites your raw data locally into a format that Milvus understands. It’s useful if you want to preprocess the data before uploading it to Milvus.
  2. If you want to directly import your data into Milvus, We recommend using RemoteBulkWriter instead, which handles data ingestion remotely and can simplify the import process.

@msj121
Copy link

msj121 commented Jan 1, 2025

@bigsheeper

The documentation is a little unclear to me - RemoteBulkWriter appears to need a remote path - how do I know this remote path? Also is the default bucket "a-bucket" okay to be used or should I be making a new one - I'd prefer to keep things simple for now.

Thanks for any help I have been working quite a few hours on this - very confused.

Similarly if I have a sparse vector populated by BM25 function, i assume upserting will build this index, will remotebulkwrite also do this? Do I need to populate this field manually?

@thebirdgr
Copy link

thebirdgr commented Jan 2, 2025

@bigsheeper

Hello! Working on another project as well. Would like to import local parquet files without needing a remote connection too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Issues related to feature request from users
Projects
None yet
Development

No branches or pull requests

5 participants