In the lab tests, data sets from SIFT1B will be provided.
If you already have data for the tests, it is recommended to prepare vectors in .npy format, with each file containing <= 100,000 vectors.
Saving data in .npy file could largely reduce the file size. Take the single-precision 512-dimensional vectors as an example, saving 100,000 million such vectors in CSV file takes about 800 MB, while in .npy file, the file size is reduced to < 400 MB.
If you have only the CSV files, follow below steps to convert them into binary files in .npy format:
- Read the CSV file through
pandas.read_csv
, and generatepandas.DataFrame
data structure. - Through
numpy.array
, convertpandas.DataFrame
tonumpy.array
data structure. - Through
numpy.save
, save the array to a binary file in .npy format.
Currently, Milvus provides Python SDK. Follow below steps to import vector data through Python scripts.
- Read the .npy file through
numpy.load
, and generatenumpy.array
data structure. - Through
numpy.array.tolist
, convert thenumpy.array
to a 2-dimensional list (in the form of [[],[]...[]]). - Import the 2-dimensional list into Milvus through the Python scripts. A list of vector IDs will be returned instantly.
- Read the CSV file through
pandas.read_csv
, and generatepandas.DataFrame
data structure. - Through
numpy.array
, convertpandas.DataFrame
tonumpy.array
data structure. - Through
numpy.array.tolist
, convert thenumpy.array
to a 2-dimensional list (in the form of [[],[]...[]]). - Import the 2-dimensional list into Milvus through the Python scripts. A list of vector IDs will be returned instantly.