Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add Text Embedding Function #36366

Open
wants to merge 18 commits into
base: master
Choose a base branch
from

Conversation

junjiejiangjjj
Copy link

@junjiejiangjjj junjiejiangjjj commented Sep 19, 2024

@sre-ci-robot sre-ci-robot added the do-not-merge/work-in-progress Don't merge even CI passed. label Sep 19, 2024
@sre-ci-robot
Copy link
Contributor

Welcome @junjiejiangjjj! It looks like this is your first PR to milvus-io/milvus 🎉

@sre-ci-robot sre-ci-robot added the size/L Denotes a PR that changes 100-499 lines. label Sep 19, 2024
@mergify mergify bot added dco-passed DCO check passed. kind/feature Issues related to feature request from users labels Sep 19, 2024
Copy link
Contributor

mergify bot commented Sep 19, 2024

@junjiejiangjjj Please associate the related issue to the body of your Pull Request. (eg. “issue: #”)

@junjiejiangjjj junjiejiangjjj force-pushed the embedding branch 3 times, most recently from fd701cd to 5407f22 Compare September 20, 2024 03:03
@sre-ci-robot sre-ci-robot added size/XL Denotes a PR that changes 500-999 lines. and removed size/L Denotes a PR that changes 100-499 lines. labels Sep 23, 2024
@junjiejiangjjj junjiejiangjjj force-pushed the embedding branch 6 times, most recently from 200bc3b to ed7d332 Compare September 26, 2024 07:25
@sre-ci-robot sre-ci-robot added size/XXL Denotes a PR that changes 1000+ lines. and removed size/XL Denotes a PR that changes 500-999 lines. labels Sep 26, 2024
@junjiejiangjjj junjiejiangjjj force-pushed the embedding branch 2 times, most recently from 38243dd to 0d0286e Compare October 8, 2024 08:26
@sre-ci-robot sre-ci-robot added the area/dependency Pull requests that update a dependency file label Oct 8, 2024
@junjiejiangjjj junjiejiangjjj force-pushed the embedding branch 3 times, most recently from 0f30960 to 7950316 Compare October 11, 2024 03:53
Signed-off-by: junjie.jiang <[email protected]>
Signed-off-by: junjie.jiang <[email protected]>
Signed-off-by: junjiejiangjjj <[email protected]>
Signed-off-by: junjie.jiang <[email protected]>
Signed-off-by: junjie.jiang <[email protected]>
Signed-off-by: junjie.jiang <[email protected]>
Signed-off-by: junjie.jiang <[email protected]>
Signed-off-by: junjie.jiang <[email protected]>
Signed-off-by: junjie.jiang <[email protected]>
Signed-off-by: junjie.jiang <[email protected]>
Copy link
Contributor

mergify bot commented Dec 26, 2024

@junjiejiangjjj go-sdk check failed, comment rerun go-sdk can trigger the job again.

Copy link
Contributor

mergify bot commented Dec 26, 2024

@junjiejiangjjj cpp-unit-test check failed, comment rerun cpp-unit-test can trigger the job again.

Copy link
Collaborator

@zhengbuqian zhengbuqian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also see several comments from last review nor addressed neither replied, please take a look. Thanks!

apiKey: apiKey,
url: url,
},
apiVersion: "2024-06-01",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does it make sense to pass in this apiVersion instead of hard coding?

based on https://learn.microsoft.com/en-us/azure/ai-services/openai/api-version-deprecation seems there is little breaking change introduced to the API, mostly new features.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New features brought by the new version generally require code adaptation, so it is not very meaningful to open the configuration of apiVersion.

internal/proxy/task_search.go Outdated Show resolved Hide resolved

// TODO: unify the function implementation, storage/utils.go & proxy/util.go
func IsBM25FunctionOutputField(field *schemapb.FieldSchema) bool {
return field.GetIsFunctionOutput() && field.GetDataType() == schemapb.DataType_SparseFloatVector
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not true? other models can also have sparse float vector output

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no problem right now. After the implementation of bm25 and the model function is unified, the relevant code will not be needed.

internal/models/utils/embedding_util.go Outdated Show resolved Hide resolved
OutputType string `json:"output_type,omitempty"`
}

type EmbeddingRequest struct {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

EmbeddingRequest can be private? by lowercase the first letter embeddingRequest. I did a quick search and find no usage of this class outside this file. correct me IIW. This comment applies to many other structs in this and other files.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only used in test files under milvus/internal/util/function/

internal/util/function/text_embedding_function.go Outdated Show resolved Hide resolved
internal/util/function/text_embedding_function.go Outdated Show resolved Hide resolved
internal/util/function/openai_embedding_provider.go Outdated Show resolved Hide resolved
internal/models/openai/openai_embedding.go Outdated Show resolved Hide resolved
internal/models/openai/openai_embedding.go Outdated Show resolved Hide resolved
@junjiejiangjjj junjiejiangjjj changed the title feat: Add openai embedding feat: Add Text Embedding Function Dec 31, 2024
Copy link
Contributor

mergify bot commented Dec 31, 2024

@junjiejiangjjj go-sdk check failed, comment rerun go-sdk can trigger the job again.

@junjiejiangjjj
Copy link
Author

rerun go-sdk

@junjiejiangjjj junjiejiangjjj force-pushed the embedding branch 2 times, most recently from 13b8037 to ab53008 Compare January 2, 2025 06:55
Copy link
Contributor

mergify bot commented Jan 2, 2025

@junjiejiangjjj go-sdk check failed, comment rerun go-sdk can trigger the job again.

Copy link
Contributor

mergify bot commented Jan 2, 2025

@junjiejiangjjj go-sdk check failed, comment rerun go-sdk can trigger the job again.

2 similar comments
Copy link
Contributor

mergify bot commented Jan 2, 2025

@junjiejiangjjj go-sdk check failed, comment rerun go-sdk can trigger the job again.

Copy link
Contributor

mergify bot commented Jan 3, 2025

@junjiejiangjjj go-sdk check failed, comment rerun go-sdk can trigger the job again.

Signed-off-by: junjie.jiang <[email protected]>
Copy link
Contributor

mergify bot commented Jan 3, 2025

@junjiejiangjjj go-sdk check failed, comment rerun go-sdk can trigger the job again.

Copy link
Contributor

mergify bot commented Jan 3, 2025

@junjiejiangjjj E2e jenkins job failed, comment /run-cpu-e2e can trigger the job again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/dependency Pull requests that update a dependency file dco-passed DCO check passed. kind/feature Issues related to feature request from users size/XXL Denotes a PR that changes 1000+ lines.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants