Skip to content

Commit

Permalink
Merge branch 'branch-24.10' into david-multi-port-module-953
Browse files Browse the repository at this point in the history
  • Loading branch information
dagardner-nv authored Aug 1, 2024
2 parents 964bd98 + 9fae209 commit 860af47
Show file tree
Hide file tree
Showing 106 changed files with 1,133 additions and 1,005 deletions.
12 changes: 6 additions & 6 deletions .devcontainer/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,20 +17,20 @@ limitations under the License.

# Morpheus Devcontainer

The Morpheus devcontainer is provided as a quick-to-set-up development and exploration environment for use with [Visual Studio Code](https://code.visualstudio.com) (Code). The devcontainer is a lightweight container which mounts-in a conda environment with cached packages, alleviating long conda download times on subsequent launches. It provides a simple framework for adding developer-centric [scripts](#development-scripts), and incorperates some helpful Code plugins, such as clangd and cmake support.
The Morpheus devcontainer is provided as a quick-to-set-up development and exploration environment for use with [Visual Studio Code](https://code.visualstudio.com) (Code). The devcontainer is a lightweight container which mounts-in a Conda environment with cached packages, alleviating long Conda download times on subsequent launches. It provides a simple framework for adding developer-centric [scripts](#development-scripts), and incorporates some helpful Code plugins, such as clangd and CMake support.

More information about devcontainers can be found at [containers.dev](https://containers.dev/).
More information about devcontainers can be found at [`containers.dev`](https://containers.dev/).

## Getting Started

To get started, simply open the morpheus repository root folder within Code. A window should appear at the bottom-right corner of the editor asking if you would like to reopen the workspace inside of the dev container. After clicking the confirmation dialog, the container will first build, then launch, then remote-attach.
To get started, simply open the Morpheus repository root folder within Code. A window should appear at the bottom-right corner of the editor asking if you would like to reopen the workspace inside of the dev container. After clicking the confirmation dialog, the container will first build, then launch, then remote-attach.

If the window does not appear, or you would like to rebuild the container, click ctrl-shift-p and search for `Dev Containers: Rebuild and Reopen in Container`. Hit enter, and the container will first build, then launch, then remote-attach.

Once remoted in to the devcontainer within code, the `setup-morpheus-env` script will begin to run and solve a morpheus conda environment (this conda environment is local to the morpheus repository and dev container and will not override any host environments). You should see the script executing in one of Code's integrated terminal. Once the script has completed, we're ready to start development or exploration of Morpheus. By default, each _new_ integrated terminal will automatically conda activate the morpheus environment.
Once connected to the devcontainer within code, the `setup-morpheus-env` script will begin to run and solve a Morpheus Conda environment (this Conda environment is local to the Morpheus repository and dev container and will not override any host environments). You should see the script executing in one of Code's integrated terminal. Once the script has completed, we're ready to start development or exploration of Morpheus. By default, each _new_ integrated terminal will automatically Conda activate the Morpheus environment.

## Development Scripts
Several convienient scripts are available in the devcontainer's `PATH` (`.devcontainer/bin`) for starting, stopping, and interacting with Triton and Kafka. More scripts can be added as needed.
Several convenient scripts are available in the devcontainer's `PATH` (`.devcontainer/bin`) for starting, stopping, and interacting with Triton and Kafka. More scripts can be added as needed.

### Interacting with Triton
To start Triton and connect it to the devcontainer network, the `dev-triton-start` script can be used. The following example starts _or restarts_ Triton with the `abp-pcap-xgb` model loaded.
Expand All @@ -54,7 +54,7 @@ To start Kafka and connect it to the devcontainer network, the `dev-kafka-start`
```
dev-kafka-start
```
Kafka should now be started and DNS resolveable as `kafka`.
Kafka should now be started and DNS resolvable as `kafka`.
```
ping kafka
```
Expand Down
29 changes: 29 additions & 0 deletions .vale.ini
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
StylesPath = ci/vale/styles

MinAlertLevel = error

Vocab = morpheus

Packages = Microsoft, write-good

# Configs for markdown and reStructuredText files
[*{.md,.rst}]

BasedOnStyles = Vale, write-good, Microsoft

# Lower these checks to just 'suggestion' level.

# This check enforces usage of contractions (ex: "it is" -> "it's") lowering to suggestion to allow it
Microsoft.Contractions = suggestion

# This check disallows the use of "there is" and "there are" at the start of a sentence, I tried looking this up to
# determine the reasoning behind the rule but could not find one. Lowering to suggestion to allow it
write-good.ThereIs = suggestion

# Allow writing dates in numeric form 02/10/2022
Microsoft.DateOrder = suggestion

# reStructuredText specific configs
[*.rst]
# Ignore template items inside of curly braces
TokenIgnores = ({.*})
12 changes: 6 additions & 6 deletions ci/conda/channel/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,13 +15,13 @@ See the License for the specific language governing permissions and
limitations under the License.
-->

Creates a local conda channel using docker-compose and nginx. Can be helpful when testing new conda packages
Creates a local Conda channel using Docker Compose and nginx. Can be helpful when testing new Conda packages

To Use:
1. Ensure `docker-compose` is installed
2. Set the location of the conda-bld folder to host as a conda channel to the variable `$CONDA_REPO_DIR`
1. i.e. `export CONDA_REPO_DIR=$CONDA_PREFIX/conda-bld`
3. Launch docker-compose
1. Ensure Docker Compose is installed
2. Set the location of the `conda-bld` folder to host as a Conda channel to the variable `$CONDA_REPO_DIR`
1. For example, `export CONDA_REPO_DIR=$CONDA_PREFIX/conda-bld`
3. Launch Docker Compose
1. `docker compose up -d`
4. Install conda packages using the local channel
4. Install Conda packages using the local channel
1. `conda install -c http://localhost:8080 <my_package>`
27 changes: 27 additions & 0 deletions ci/scripts/documentation_checks.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
#!/bin/bash
# SPDX-FileCopyrightText: Copyright (c) 2024, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

SCRIPT_DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" &> /dev/null && pwd )"
source ${SCRIPT_DIR}/common.sh

set +e

# Intentionally excluding CHANGELOG.md as it immutable
DOC_FILES=$(git ls-files "*.md" "*.rst" | grep -v -E '^CHANGELOG\.md$')

vale ${DOC_FILES}
RETVAL=$?
exit $RETVAL
3 changes: 3 additions & 0 deletions ci/scripts/github/checks.sh
Original file line number Diff line number Diff line change
Expand Up @@ -60,3 +60,6 @@ ${MORPHEUS_ROOT}/ci/scripts/version_checks.sh

rapids-logger "Runing C++ style checks"
${MORPHEUS_ROOT}/ci/scripts/cpp_checks.sh

rapids-logger "Runing Documentation checks"
${MORPHEUS_ROOT}/ci/scripts/documentation_checks.sh
3 changes: 3 additions & 0 deletions ci/scripts/github/docs.sh
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,9 @@ rapids-logger "Building docs"
cmake --build ${BUILD_DIR} --parallel ${PARALLEL_LEVEL} --target install
cmake --build ${BUILD_DIR} --parallel ${PARALLEL_LEVEL} --target morpheus_docs

rapids-logger "Checking documentation links"
cmake --build ${BUILD_DIR} --parallel ${PARALLEL_LEVEL} --target morpheus_docs_linkcheck

rapids-logger "Archiving the docs"
tar cfj "${WORKSPACE_TMP}/docs.tar.bz" ${BUILD_DIR}/docs/html

Expand Down
77 changes: 77 additions & 0 deletions ci/vale/styles/config/vocabularies/morpheus/accept.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,77 @@
# List of case-sensitive regular expressions matching words that should be accepted by Vale. For product names like
# "cuDF" or "cuML", we want to ensure that they are capitalized the same way they're written by the product owners.
# Regular expressions are parsed according to the Go syntax: https://golang.org/pkg/regexp/syntax/

API(s?)
[Aa]utoencoder
[Aa]nonymize(d?)
[Bb]ackpressure
[Bb]atcher
[Bb]oolean
# Documentation for ccache only capitalizes the name at the start of a sentence https://ccache.dev/
[Cc]cache
[Cc]hatbot(s?)
# clangd is never capitalized even at the start of a sentence https://clangd.llvm.org/
clangd
CMake
[Cc]omposable
Conda
CPython
[Cc]ryptocurrenc[y|ies]
[Cc]yber
[Cc]ybersecurity
Cython
Dask
Databricks
[Dd]eserialize
[Dd]ev
[Dd]ocstring(s?)
[Ee]ngineerable
[Ee]xplainability
[Gg]eneratable
glog
GPU(s?)
Grafana
[Gg]ranularities
[Hh]ashable
[Hh]yperparameter(s?)
[Ii]nferencing
jsonlines
# libcudf isn't styled in the way that cuDF is https://docs.rapids.ai/api/libcudf/stable/
libcudf
LLM(s?)
# https://github.com/logpai/loghub/
Loghub
Milvus
[Mm]ixin
MLflow
Morpheus
[Nn]amespace(s?)
NeMo
nginx
NIC
NIM(s?)
NVIDIA
[Pp]arallelization
[Pp]arsable
PCIe
PDF(s?)
[Pp]reprocess
[Pp]retrained
pytest
[Rr]epo
[Rr]etarget(ed?)
[Ss]erializable
[Ss]ubclassing
[Ss]ubcard(s?)
[Ss]ubgraph(s?)
[Ss]ubword(s?)
[Tt]imestamp(s?)
[Tt]okenization
[Tt]okenizer(s?)
triages
[Uu]nencrypted
[Uu]nittest(s?)
[Uu]ploader
XGBoost
zsh
3 changes: 3 additions & 0 deletions ci/vale/styles/config/vocabularies/morpheus/reject.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# List of regular expressions matching words we want to reject. Even though we don't have any words listed this
# file needs to exitst in order for vale to pick up our accept.txt file
# Regular expressions are parsed according to the Go syntax: https://golang.org/pkg/regexp/syntax/
3 changes: 3 additions & 0 deletions conda/environments/all_cuda-121_arch-x86_64.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -110,6 +110,9 @@ dependencies:
- transformers=4.36.2
- tritonclient=2.34
- typing_utils=0.1
- vale-styles-microsoft
- vale-styles-write-good
- vale=3.7
- versioneer
- versioneer-518
- watchdog=3.0
Expand Down
3 changes: 3 additions & 0 deletions conda/environments/dev_cuda-121_arch-x86_64.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -90,6 +90,9 @@ dependencies:
- tqdm=4
- tritonclient=2.34
- typing_utils=0.1
- vale-styles-microsoft
- vale-styles-write-good
- vale=3.7
- versioneer
- versioneer-518
- watchdog=3.0
Expand Down
3 changes: 3 additions & 0 deletions dependencies.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -284,6 +284,9 @@ dependencies:
- include-what-you-use=0.20
- isort
- pylint=3.0.3
- vale=3.7
- vale-styles-microsoft
- vale-styles-write-good
- versioneer
- yapf=0.40.1

Expand Down
15 changes: 13 additions & 2 deletions docs/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -20,14 +20,25 @@ find_package(Sphinx REQUIRED)

set(SPHINX_SOURCE ${CMAKE_CURRENT_SOURCE_DIR}/source)
set(SPHINX_BUILD ${CMAKE_CURRENT_BINARY_DIR}/html)
set(SPHINX_ARGS -b html -j auto -T -W)
set(SPHINX_LINKCHECK_OUT ${CMAKE_CURRENT_BINARY_DIR}/linkcheck)
set(SPHINX_ARGS -j auto -T -W)
set(SPHINX_HTML_ARGS -b html ${SPHINX_ARGS})
set(SPHINX_LINKCHECK_ARGS -b linkcheck ${SPHINX_ARGS})

add_custom_target(${PROJECT_NAME}_docs
COMMAND
BUILD_DIR=${CMAKE_CURRENT_BINARY_DIR} ${SPHINX_EXECUTABLE} ${SPHINX_ARGS} ${SPHINX_SOURCE} ${SPHINX_BUILD}
BUILD_DIR=${CMAKE_CURRENT_BINARY_DIR} ${SPHINX_EXECUTABLE} ${SPHINX_HTML_ARGS} ${SPHINX_SOURCE} ${SPHINX_BUILD}
WORKING_DIRECTORY ${CMAKE_CURRENT_BINARY_DIR}
COMMENT "Generating documentation with Sphinx"
DEPENDS morpheus-package-outputs
)

add_custom_target(${PROJECT_NAME}_docs_linkcheck
COMMAND
BUILD_DIR=${CMAKE_CURRENT_BINARY_DIR} ${SPHINX_EXECUTABLE} ${SPHINX_LINKCHECK_ARGS} ${SPHINX_SOURCE} ${SPHINX_LINKCHECK_OUT}
WORKING_DIRECTORY ${CMAKE_CURRENT_BINARY_DIR}
COMMENT "Checking documentation links with Sphinx"
DEPENDS morpheus-package-outputs
)

list(POP_BACK CMAKE_MESSAGE_CONTEXT)
2 changes: 1 addition & 1 deletion docs/source/basics/building_a_pipeline.md
Original file line number Diff line number Diff line change
Expand Up @@ -142,7 +142,7 @@ morpheus run pipeline-nlp --help
## Basic Usage Examples

### Remove Fields from JSON Objects
This example only copies the fields 'timestamp', 'src_ip' and 'dest_ip' from `examples/data/pcap_dump.jsonlines` to
This example only copies the fields `timestamp`, `src_ip` and `dest_ip` from `examples/data/pcap_dump.jsonlines` to
`out.jsonlines`.

![../img/remove_fields_from_json_objects.png](../img/remove_fields_from_json_objects.png)
Expand Down
2 changes: 1 addition & 1 deletion docs/source/basics/overview.rst
Original file line number Diff line number Diff line change
Expand Up @@ -106,7 +106,7 @@ queried in the same manner:
AutoComplete
------------

The Morpheus CLI supports bash, fish, zsh, and powershell autocompletion. To set up autocomplete, it must first be
The Morpheus CLI supports bash, fish, zsh, and PowerShell autocompletion. To set up autocomplete, it must first be
installed. Morpheus comes with a tool to assist with this:

.. code-block:: console
Expand Down
16 changes: 8 additions & 8 deletions docs/source/cloud_deployment_guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -75,15 +75,15 @@ Continue with the setup steps below once the host system is installed, configure

### Set up NGC API Key and Install NGC Registry CLI

First, you will need to set up your NGC API Key to access all the Morpheus components, using the linked instructions from the [NGC Registry CLI User Guide](https://docs.nvidia.com/dgx/ngc-registry-cli-user-guide/index.html#topic_4_1).
First, you will need to set up your NGC API Key to access all the Morpheus components, using the linked instructions from the [NGC Registry CLI User Guide](https://docs.nvidia.com/ngc/gpu-cloud/ngc-private-registry-user-guide/index.html#generating-personal-api-key).

Once you've created your API key, create an environment variable containing your API key for use by the commands used further in this document:

```bash
export API_KEY="<NGC_API_KEY>"
```

Next, install and configure the NGC Registry CLI on your system using the linked instructions from the [NGC Registry CLI User Guide](https://docs.nvidia.com/dgx/ngc-registry-cli-user-guide/index.html#topic_4_1).
Next, install and configure the NGC Registry CLI on your system using the linked instructions from the [NGC Registry CLI User Guide](https://docs.nvidia.com/ngc/gpu-cloud/ngc-private-registry-user-guide/index.html#generating-personal-api-key).

### Create Namespace for Morpheus

Expand Down Expand Up @@ -221,7 +221,7 @@ kubectl -n $NAMESPACE exec -it deploy/mlflow -- bash
(mlflow) root@mlflow-6d98:/mlflow#
```

`Important`: When (mlflow) is present, commands are directly within the container.
`Important`: When `(mlflow)` is present, commands are directly within the container.

First let's examine the syntax of the commands we will be using to communicate with the MLflow Triton plugin before we start deploying models.
Publish models to MLflow server is in the form of:
Expand Down Expand Up @@ -427,9 +427,9 @@ helm install --set ngc.apiKey="$API_KEY" \

### Run AutoEncoder Digital Fingerprinting Pipeline
The following AutoEncoder pipeline example shows how to train and validate the AutoEncoder model and write the inference results to a specified location. Digital fingerprinting has also been referred to as **HAMMAH (Human as Machine <> Machine as Human)**.
These use cases are currently implemented to detect user behavior changes that indicate a change from a human to a machine or a machine to a human, thus leaving a "digital fingerprint". The model is an ensemble of an autoencoder and fast fourier transform reconstruction.
These use cases are currently implemented to detect user behavior changes that indicate a change from a human to a machine or a machine to a human, thus leaving a "digital fingerprint." The model is an ensemble of an autoencoder and fast Fourier transform reconstruction.

Inference and training based on a userid (`user123`). The model is trained once and inference is conducted on the supplied input entries in the example pipeline below. The `--train_data_glob` parameter must be removed for continuous training.
Inference and training based on a user ID (`user123`). The model is trained once and inference is conducted on the supplied input entries in the example pipeline below. The `--train_data_glob` parameter must be removed for continuous training.

```bash
helm install --set ngc.apiKey="$API_KEY" \
Expand Down Expand Up @@ -620,7 +620,7 @@ kubectl -n $NAMESPACE exec -it deploy/broker -c broker -- kafka-console-producer
> **Note**: This should be used for development purposes only via this developer kit. Loading from the file into Kafka should not be used in production deployments of Morpheus.
### Run FIL Anomalous Behavior Profiling Pipeline
The following Anomalous Behavior Profiling pipeline examples use a pre-trained FIL model to ingest and analyze NVIDIA System Management Interface (nvidia-smi) logs, like the example below, as input sample data to identify crypto mining activity on GPU devices.
The following Anomalous Behavior Profiling pipeline examples use a pre-trained FIL model to ingest and analyze NVIDIA System Management Interface (`nvidia-smi`) logs, like the example below, as input sample data to identify cryptocurrency mining activity on GPU devices.

```json
{"nvidia_smi_log.gpu.pci.tx_util": "0 KB/s", "nvidia_smi_log.gpu.pci.rx_util": "0 KB/s", "nvidia_smi_log.gpu.fb_memory_usage.used": "3980 MiB", "nvidia_smi_log.gpu.fb_memory_usage.free": "12180 MiB", "nvidia_smi_log.gpu.bar1_memory_usage.total": "16384 MiB", "nvidia_smi_log.gpu.bar1_memory_usage.used": "11 MiB", "nvidia_smi_log.gpu.bar1_memory_usage.free": "16373 MiB", "nvidia_smi_log.gpu.utilization.gpu_util": "0 %", "nvidia_smi_log.gpu.utilization.memory_util": "0 %", "nvidia_smi_log.gpu.temperature.gpu_temp": "61 C", "nvidia_smi_log.gpu.temperature.gpu_temp_max_threshold": "90 C", "nvidia_smi_log.gpu.temperature.gpu_temp_slow_threshold": "87 C", "nvidia_smi_log.gpu.temperature.gpu_temp_max_gpu_threshold": "83 C", "nvidia_smi_log.gpu.temperature.memory_temp": "57 C", "nvidia_smi_log.gpu.temperature.gpu_temp_max_mem_threshold": "85 C", "nvidia_smi_log.gpu.power_readings.power_draw": "61.77 W", "nvidia_smi_log.gpu.clocks.graphics_clock": "1530 MHz", "nvidia_smi_log.gpu.clocks.sm_clock": "1530 MHz", "nvidia_smi_log.gpu.clocks.mem_clock": "877 MHz", "nvidia_smi_log.gpu.clocks.video_clock": "1372 MHz", "nvidia_smi_log.gpu.applications_clocks.graphics_clock": "1312 MHz", "nvidia_smi_log.gpu.applications_clocks.mem_clock": "877 MHz", "nvidia_smi_log.gpu.default_applications_clocks.graphics_clock": "1312 MHz", "nvidia_smi_log.gpu.default_applications_clocks.mem_clock": "877 MHz", "nvidia_smi_log.gpu.max_clocks.graphics_clock": "1530 MHz", "nvidia_smi_log.gpu.max_clocks.sm_clock": "1530 MHz", "nvidia_smi_log.gpu.max_clocks.mem_clock": "877 MHz", "nvidia_smi_log.gpu.max_clocks.video_clock": "1372 MHz", "nvidia_smi_log.gpu.max_customer_boost_clocks.graphics_clock": "1530 MHz", "nvidia_smi_log.gpu.processes.process_info.0.process_name": "python", "nvidia_smi_log.gpu.processes.process_info.1.process_name": "tritonserver", "hostname": "ip-10-100-8-98", "timestamp": 1615542360.9566503}
Expand Down Expand Up @@ -794,7 +794,7 @@ This section lists solutions to problems you might encounter with Morpheus or fr
- Models Unloaded After Reboot
- When the pod is restarted, K8s will not automatically load the models. Since models are deployed to *ai-engine* in explicit mode using MLflow, we'd have to manually deploy them again using the [Model Deployment](#model-deployment) process.
- AI Engine CPU Only Mode
- After a server restart, the ai-engine pod on k8s can start up before the GPU operator infrastructure is available, making it "think" there is no driver installed (i.e., CPU -only mode).
- After a server restart, the ai-engine pod on k8s can start up before the GPU operator infrastructure is available, making it "think" there is no driver installed (that is, CPU -only mode).
- Improve Pipeline Message Processing Rate
- Below settings need to be considered
- Provide the workflow with the optimal number of threads (`—num threads`), as having more or fewer threads can have an impact on pipeline performance.
Expand All @@ -804,6 +804,6 @@ This section lists solutions to problems you might encounter with Morpheus or fr
```console
1649207839.253|COMMITFAIL|rdkafka#consumer-2| [thrd:main]: Offset commit (manual) failed for 1/1 partition(s) in join-state wait-unassign-call: Broker: Unknown member: topic[0]@112071(Broker: Unknown member)
```
- Problem: If the standalone kafka cluster is receiving significant message throughput from the producer, this error may happen.
- Problem: If the standalone Kafka cluster is receiving significant message throughput from the producer, this error may happen.

- Solution: Reinstall the Morpheus workflow and reduce the Kafka topic's message retention time and message producing rate.
Loading

0 comments on commit 860af47

Please sign in to comment.