From b72ca4c274dd16e59e6c901f012a14f43e4b8504 Mon Sep 17 00:00:00 2001 From: David Gardner <96306125+dagardner-nv@users.noreply.github.com> Date: Wed, 17 Jul 2024 12:36:05 -0700 Subject: [PATCH] Update documentation for `vdb_upload` to use realistic source data with the `--file_source` flag (#1800) * Replace "./morpheus/data/*" as a data source for `vdb_upload` which is not a valid data source. * Add new serialized dataframes to `examples/data/vdb_upload` * `doca_guides.jsonlines`: Serialized Dataframe from data in `examples/doca/vdb_realtime/sender/dataset` * `nvidia_blogs.jsonlines`: Serialized Dataframe from two Nvidia blog posts: - https://blogs.nvidia.com/blog/mlperf-training-benchmarks/ - https://blogs.nvidia.com/blog/ai-security-steps/ Closes #1790 ## By Submitting this PR I confirm: - I am familiar with the [Contributing Guidelines](https://github.com/nv-morpheus/Morpheus/blob/main/docs/source/developer_guide/contributing.md). - When the PR is ready for review, new or existing tests cover these changes. - When the PR is ready for review, the documentation is up to date with these changes. Authors: - David Gardner (https://github.com/dagardner-nv) Approvers: - Michael Demoret (https://github.com/mdemoret-nv) URL: https://github.com/nv-morpheus/Morpheus/pull/1800 --- examples/data/vdb_upload/doca_guides.jsonlines | 3 +++ examples/data/vdb_upload/nvidia_blogs.jsonlines | 3 +++ examples/llm/vdb_upload/README.md | 4 ++-- 3 files changed, 8 insertions(+), 2 deletions(-) create mode 100644 examples/data/vdb_upload/doca_guides.jsonlines create mode 100644 examples/data/vdb_upload/nvidia_blogs.jsonlines diff --git a/examples/data/vdb_upload/doca_guides.jsonlines b/examples/data/vdb_upload/doca_guides.jsonlines new file mode 100644 index 0000000000..9bb90be880 --- /dev/null +++ b/examples/data/vdb_upload/doca_guides.jsonlines @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:23ff28cce2fde3b3b625057d1c68d454b28026f0476dd8b5f8addf4137e7c00e +size 28661 diff --git a/examples/data/vdb_upload/nvidia_blogs.jsonlines b/examples/data/vdb_upload/nvidia_blogs.jsonlines new file mode 100644 index 0000000000..2efd5dd00c --- /dev/null +++ b/examples/data/vdb_upload/nvidia_blogs.jsonlines @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:42c8fadcee174402e7fd50a060516a6f07a20f2ac0750210997442030c7982ad +size 10260 diff --git a/examples/llm/vdb_upload/README.md b/examples/llm/vdb_upload/README.md index b8a3ef35e5..de6c18a81d 100644 --- a/examples/llm/vdb_upload/README.md +++ b/examples/llm/vdb_upload/README.md @@ -214,7 +214,7 @@ python examples/llm/main.py vdb_upload pipeline \ ```bash python examples/llm/main.py vdb_upload pipeline \ --source_type filesystem \ - --file_source "./morpheus/data/*" \ + --file_source="./examples/data/vdb_upload/*.jsonlines" \ --enable_monitors \ --embedding_model_name all-MiniLM-L6-v2 ``` @@ -224,7 +224,7 @@ python examples/llm/main.py vdb_upload pipeline \ ```bash python examples/llm/main.py vdb_upload pipeline \ --source_type rss --source_type filesystem \ - --file_source "./morpheus/data/*" \ + --file_source="./examples/data/vdb_upload/*.jsonlines" \ --interval_secs 600 \ --enable_cache \ --enable_monitors \