-
Notifications
You must be signed in to change notification settings - Fork 3k
Getting started
Bennu edited this page Oct 15, 2021
·
8 revisions
For the following, we assume Milvus is installed. We provide code examples in Python and Node. The code can be run by copy/pasting it. Getting some data.
To run insert and search in Milvus, we need two matrices:
-
xb
for the database, that contains the vectors that must be inserted to Milvus collection, and that we are going to search in it. Its size is nb-by-d -
xq
for the query vectors, for which we need to find the nearest neighbors. Its size is nq-by-d. If we have a single query vector, nq=1. In the following examples we are going to work with vectors that are drawn form a uniform distribution in d=128 dimensions.
In Python
import numpy as np
d = 128 # dimension
nb = 100000 # database size
nq = 1000 # nb of queries
np.random.seed(1234) # make reproducible
xb = np.random.random((nb, d)).astype('float32').tolist()
xq = np.random.random((nq, d)).astype('float32').tolist()
In node
const d=128;
const nb=100000;
const nq=1000;
const entities = Array.from({ length: nb }, () => ({
[FIELD_NAME]: Array.from({ length: nq }, () => Math.floor(Math.random() * nb)),
}));
const xq = Array.from({ length: d }, () => Math.floor(Math.random() * nq));
To use Milvus, you need to connect Milvus server first.
In Python
from pymilvus import connections, FieldSchema, CollectionSchema, DataType, Collection
connections.connect(host='localhost', port='19530')
In node
import { MilvusClient } from "@zilliz/milvus2-sdk-node";
const milvusClient = new MilvusClient("localhost:19530");
Before inserting data into Milvus, you need to create a collection in Milvus and know some Milvus glossary as follows:
- collection: A collection in Milvus is equivalent to a table in a relational database management system (RDBMS). In Milvus, collections are used to store and manage entities.
- entity: An entity consists of a group of fields that represent real world objects. Each entity in Milvus is represented by a unique row ID.
You can customize row IDs. If you do not configure manually, Milvus automatically assigns row IDs to entities. If you choose to configure your own customized row IDs, note that Milvus does not support row ID de-duplication for now. Therefore, there can be duplicate row IDs in the same collection.
- filed: Fields are the units that make up entities. Fields can be structured data (e.g., numbers, strings) or vectors. In Python
collection_name = "hello_milvus"
default_fields = [
FieldSchema(name="id", dtype=DataType.INT64, is_primary=True, auto_id=True),
FieldSchema(name="vector", dtype=DataType.FLOAT_VECTOR, dim=d)
]
default_schema = CollectionSchema(fields=default_fields, description="test collection")
print(f"\nCreate collection...")
collection = Collection(name= collection_name, schema=default_schema)
print(f"\nInsert data")
mr = collection.insert([xb])
# flush data
pymilvus.utility.flush([collection_name])
# show the number of the entities that insert into Milvus
print(collection.num_entities)
# view the id that Milvus auto genarate
print(mr.primary_keys)
In node
const collection_name = "hello_milvus"
const params = {
collection_name: collection_name,
fields: [
{
name: "vector",
description: "vector field",
data_type: DataType.FloatVector,
type_params: {
dim: d,
},
},
{
name: "id",
data_type: DataType.Int64,
autoID: true,
is_primary_key: true,
description: "",
},
],
};
await milvusClient.collectionManager.createCollection(params);
await milvusClient.dataManager.insert({{
collection_name: collection_name,
fields_data: entities,
});
await milvusClient.dataManager.flush({ collection_names: [collection_name] });