Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add CSV file import function #27149

Merged
merged 1 commit into from
Oct 31, 2023
Merged

Add CSV file import function #27149

merged 1 commit into from
Oct 31, 2023

Conversation

KumaJie
Copy link
Contributor

@KumaJie KumaJie commented Sep 16, 2023

@sre-ci-robot sre-ci-robot added the size/XXL Denotes a PR that changes 1000+ lines. label Sep 16, 2023
@mergify mergify bot added the dco-passed DCO check passed. label Sep 16, 2023
@mergify
Copy link
Contributor

mergify bot commented Sep 20, 2023

@KumaJie E2e jenkins job failed, comment /run-cpu-e2e can trigger the job again.

@KumaJie KumaJie force-pushed the import branch 2 times, most recently from b699a71 to dcbf29a Compare September 20, 2023 02:42
@mergify
Copy link
Contributor

mergify bot commented Sep 20, 2023

@KumaJie E2e jenkins job failed, comment /run-cpu-e2e can trigger the job again.

@wayblink
Copy link
Contributor

@KumaJie
Hi, thanks for your contribution.

I tried your code. One problem I found: CSV file start with a Byte Order Mark \ufeffi will fail with error message the field '\ufeffid' is not defined in collection schema. I think we need to make it compatible.

Some discussion about BOM: https://stackoverflow.com/questions/21371673/reading-files-with-a-bom-in-go

@KumaJie
Copy link
Contributor Author

KumaJie commented Sep 25, 2023

@KumaJie Hi, thanks for your contribution.

I tried your code. One problem I found: CSV file start with a Byte Order Mark \ufeffi will fail with error message the field '\ufeffid' is not defined in collection schema. I think we need to make it compatible.

Some discussion about BOM: https://stackoverflow.com/questions/21371673/reading-files-with-a-bom-in-go

Thanks for your feedback and I will fix that. Your suggestion is very helpful😊.

// csv parser
reader := bufio.NewReader(file)
// discard bom in the file
r, _, err := reader.ReadRune()
if err != nil {
    return err
}
if r != '\ufeff' {
    reader.UnreadRune()
}
// ....

@xiaofan-luan
Copy link
Collaborator

rerun the ut

@mergify
Copy link
Contributor

mergify bot commented Oct 4, 2023

@KumaJie ut workflow job failed, comment rerun ut can trigger the job again.

@codecov
Copy link

codecov bot commented Oct 4, 2023

Codecov Report

Merging #27149 (42154f1) into master (bf7f32b) will increase coverage by 0.03%.
Report is 12 commits behind head on master.
The diff coverage is 86.67%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #27149      +/-   ##
==========================================
+ Coverage   81.74%   81.78%   +0.03%     
==========================================
  Files         821      823       +2     
  Lines      116328   116911     +583     
==========================================
+ Hits        95098    95614     +516     
- Misses      18071    18116      +45     
- Partials     3159     3181      +22     
Files Coverage Δ
internal/rootcoord/import_manager.go 75.70% <100.00%> (ø)
internal/util/importutil/import_util.go 97.77% <ø> (ø)
internal/util/importutil/json_parser.go 96.77% <100.00%> (+0.03%) ⬆️
internal/util/importutil/import_wrapper.go 91.14% <76.00%> (-2.07%) ⬇️
internal/util/importutil/csv_handler.go 91.90% <91.90%> (ø)
internal/util/importutil/csv_parser.go 81.41% <81.41%> (ø)

... and 11 files with indirect coverage changes

oldPercent := int64(0)
updateProgress := func() {
if p.updateProgressFunc != nil && reader.fileSize > 0 {
percent := (r.InputOffset() * ProgressValueForPersist) / reader.fileSize
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

csv InputOffset() is only supported after go 1.19. This will break milvus's go >=1.18 dependency restrictions. @xiaofan-luan Is that OK or we have to work around it?

@wayblink
Copy link
Contributor

@KumaJie Generally LGTM. I prefer to put BOM handle logic into CSVParser. Could you please adjust it and fix code checks in https://github.com/milvus-io/milvus/actions/runs/6298875222/job/17382624374?pr=27149?

@KumaJie
Copy link
Contributor Author

KumaJie commented Oct 26, 2023

rerun the ut

@KumaJie
Copy link
Contributor Author

KumaJie commented Oct 26, 2023

rerun ut

@github-actions
Copy link
Contributor

Hello KumaJie, you are not in the organization, so you do not have the permission to rerun the workflow, please contact @milvus-io/milvus-maintainers for help.

@xiaofan-luan
Copy link
Collaborator

rerun it

@czs007
Copy link
Collaborator

czs007 commented Oct 31, 2023

/approve
/lgtm

@sre-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: czs007, KumaJie

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@mergify mergify bot removed the ci-passed label Oct 31, 2023
@czs007 czs007 merged commit e88212b into milvus-io:master Oct 31, 2023
11 of 12 checks passed
sre-ci-robot pushed a commit that referenced this pull request Nov 15, 2023
issue: #27148

from pr: #27149

Signed-off-by: kuma <[email protected]>
Co-authored-by: kuma <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved dco-passed DCO check passed. lgtm size/XXL Denotes a PR that changes 1000+ lines.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants