Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update the DFP example container to use the morpheus-dfp conda package #1970

Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .devcontainer/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -96,7 +96,7 @@ ENV PYTHON_PACKAGE_MANAGER="${PYTHON_PACKAGE_MANAGER}"

ENV SCCACHE_REGION="us-east-2"
ENV SCCACHE_BUCKET="rapids-sccache-devs"
ENV VAULT_HOST="https://vault.ops.k8s.rapids.ai"
ENV AWS_ROLE_ARN="arn:aws:iam::279114543810:role/nv-gha-token-sccache-devs"
ENV HISTFILE="/home/coder/.cache/._bash_history"

ENV MORPHEUS_SUPPORT_DOCA=ON
Expand Down
2 changes: 1 addition & 1 deletion .devcontainer/cuda12.5-conda/devcontainer.json
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
"args": {
"CUDA": "12.5",
"PYTHON_PACKAGE_MANAGER": "conda",
"BASE": "rapidsai/devcontainers:24.10-cpp-mambaforge-ubuntu22.04"
"BASE": "rapidsai/devcontainers:24.12-cpp-mambaforge-ubuntu22.04"
}
},
"privileged": true,
Expand Down
1 change: 0 additions & 1 deletion ci/release/update-version.sh
Original file line number Diff line number Diff line change
Expand Up @@ -91,7 +91,6 @@ sed_runner "s/v${CURRENT_FULL_VERSION}-runtime/v${NEXT_FULL_VERSION}-runtime/g"
examples/digital_fingerprinting/production/docker-compose.yml \
examples/digital_fingerprinting/production/Dockerfile
sed_runner "s/v${CURRENT_FULL_VERSION}-runtime/v${NEXT_FULL_VERSION}-runtime/g" examples/digital_fingerprinting/production/Dockerfile
sed_runner "s|blob/branch-${CURRENT_SHORT_TAG}|blob/branch-${NEXT_SHORT_TAG}|g" examples/digital_fingerprinting/starter/README.md

# examples/developer_guide
sed_runner 's/'"VERSION ${CURRENT_FULL_VERSION}.*"'/'"VERSION ${NEXT_FULL_VERSION}"'/g' \
Expand Down
2 changes: 2 additions & 0 deletions ci/vale/styles/config/vocabularies/morpheus/accept.txt
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,7 @@ LLM(s?)
# https://github.com/logpai/loghub/
Loghub
Milvus
PyPI
[Mm]ixin
MLflow
Morpheus
Expand All @@ -71,6 +72,7 @@ pytest
[Ss]ubcard(s?)
[Ss]ubgraph(s?)
[Ss]ubword(s?)
[Ss]uperset(s?)
[Tt]imestamp(s?)
[Tt]okenization
[Tt]okenizer(s?)
Expand Down
4 changes: 2 additions & 2 deletions docs/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -30,15 +30,15 @@ add_custom_target(${PROJECT_NAME}_docs
BUILD_DIR=${CMAKE_CURRENT_BINARY_DIR} ${SPHINX_EXECUTABLE} ${SPHINX_HTML_ARGS} ${SPHINX_SOURCE} ${SPHINX_BUILD}
WORKING_DIRECTORY ${CMAKE_CURRENT_BINARY_DIR}
COMMENT "Generating documentation with Sphinx"
DEPENDS morpheus-package-outputs morpheus_llm-package-outputs
DEPENDS morpheus-package-outputs morpheus_llm-package-outputs morpheus_dfp-package-outputs
)

add_custom_target(${PROJECT_NAME}_docs_linkcheck
COMMAND
BUILD_DIR=${CMAKE_CURRENT_BINARY_DIR} ${SPHINX_EXECUTABLE} ${SPHINX_LINKCHECK_ARGS} ${SPHINX_SOURCE} ${SPHINX_LINKCHECK_OUT}
WORKING_DIRECTORY ${CMAKE_CURRENT_BINARY_DIR}
COMMENT "Checking documentation links with Sphinx"
DEPENDS morpheus-package-outputs morpheus_llm-package-outputs
DEPENDS morpheus-package-outputs morpheus_llm-package-outputs morpheus_dfp-package-outputs
)

list(POP_BACK CMAKE_MESSAGE_CONTEXT)
5 changes: 1 addition & 4 deletions docs/source/basics/overview.rst
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ The Morpheus CLI is built on the Click Python package which allows for nested co
together. At a high level, the CLI is broken up into two main sections:

* ``run``
* For running AE, FIL, NLP or OTHER pipelines.
* For running FIL, NLP or OTHER pipelines.
* ``tools``
* Tools/Utilities to help set up, configure and run pipelines and external resources.

Expand Down Expand Up @@ -58,16 +58,13 @@ run:
--help Show this message and exit.

Commands:
pipeline-ae Run the inference pipeline with an AutoEncoder model
pipeline-fil Run the inference pipeline with a FIL model
pipeline-nlp Run the inference pipeline with a NLP model
pipeline-other Run a custom inference pipeline without a specific model type


Currently, Morpheus pipeline can be operated in four different modes.

* ``pipeline-ae``
* This pipeline mode is used to run training/inference on the AutoEncoder model.
* ``pipeline-fil``
* This pipeline mode is used to run inference on FIL (Forest Inference Library) models such as XGBoost, RandomForestClassifier, etc.
* ``pipeline-nlp``
Expand Down
46 changes: 3 additions & 43 deletions docs/source/cloud_deployment_guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,6 @@ limitations under the License.
- [Verify Model Deployment](#verify-model-deployment)
- [Create Kafka Topics](#create-kafka-topics)
- [Example Workflows](#example-workflows)
- [Run AutoEncoder Digital Fingerprinting Pipeline](#run-autoencoder-digital-fingerprinting-pipeline)
- [Run NLP Phishing Detection Pipeline](#run-nlp-phishing-detection-pipeline)
- [Run NLP Sensitive Information Detection Pipeline](#run-nlp-sensitive-information-detection-pipeline)
- [Run FIL Anomalous Behavior Profiling Pipeline](#run-fil-anomalous-behavior-profiling-pipeline)
Expand Down Expand Up @@ -383,10 +382,9 @@ kubectl -n $NAMESPACE exec deploy/broker -c broker -- kafka-topics.sh \

This section describes example workflows to run on Morpheus. Four sample pipelines are provided.

1. AutoEncoder pipeline performing Digital Fingerprinting (DFP).
2. NLP pipeline performing Phishing Detection (PD).
3. NLP pipeline performing Sensitive Information Detection (SID).
4. FIL pipeline performing Anomalous Behavior Profiling (ABP).
1. NLP pipeline performing Phishing Detection (PD).
2. NLP pipeline performing Sensitive Information Detection (SID).
3. FIL pipeline performing Anomalous Behavior Profiling (ABP).

Multiple command options are given for each pipeline, with varying data input/output methods, ranging from local files to Kafka Topics.

Expand Down Expand Up @@ -424,44 +422,6 @@ helm install --set ngc.apiKey="$API_KEY" \
morpheus-sdk-client
```


### Run AutoEncoder Digital Fingerprinting Pipeline
The following AutoEncoder pipeline example shows how to train and validate the AutoEncoder model and write the inference results to a specified location. Digital fingerprinting has also been referred to as **HAMMAH (Human as Machine <> Machine as Human)**.
These use cases are currently implemented to detect user behavior changes that indicate a change from a human to a machine or a machine to a human, thus leaving a "digital fingerprint." The model is an ensemble of an autoencoder and fast Fourier transform reconstruction.

Inference and training based on a user ID (`user123`). The model is trained once and inference is conducted on the supplied input entries in the example pipeline below. The `--train_data_glob` parameter must be removed for continuous training.

```bash
helm install --set ngc.apiKey="$API_KEY" \
--set sdk.args="morpheus --log_level=DEBUG run \
--edge_buffer_size=4 \
--pipeline_batch_size=1024 \
--model_max_batch_size=1024 \
pipeline-ae \
--columns_file=data/columns_ae_cloudtrail.txt \
--userid_filter=user123 \
--feature_scaler=standard \
--userid_column_name=userIdentitysessionContextsessionIssueruserName \
--timestamp_column_name=event_dt \
from-cloudtrail --input_glob=/common/models/datasets/validation-data/dfp-cloudtrail-*-input.csv \
--max_files=200 \
train-ae --train_data_glob=/common/models/datasets/training-data/dfp-cloudtrail-*.csv \
--source_stage_class=morpheus.stages.input.cloud_trail_source_stage.CloudTrailSourceStage \
--seed 42 \
preprocess \
inf-pytorch \
add-scores \
timeseries --resolution=1m --zscore_threshold=8.0 --hot_start \
monitor --description 'Inference Rate' --smoothing=0.001 --unit inf \
serialize \
to-file --filename=/common/data/<YOUR_OUTPUT_DIR>/cloudtrail-dfp-detections.csv --overwrite" \
--namespace $NAMESPACE \
<YOUR_RELEASE_NAME> \
morpheus-sdk-client
```

For more information on the Digital Fingerprint use cases, refer to the starter example and a more production-ready example that can be found in the `examples` source directory.

### Run NLP Phishing Detection Pipeline

The following Phishing Detection pipeline examples use a pre-trained NLP model to analyze emails (body) and determine phishing or benign. Here is the sample data as shown below is used to pass as an input to the pipeline.
Expand Down
128 changes: 128 additions & 0 deletions docs/source/conda_packages.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,128 @@
<!--
SPDX-FileCopyrightText: Copyright (c) 2024, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
SPDX-License-Identifier: Apache-2.0

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->

# Morpheus Conda Packages
The Morpheus stages are the building blocks for creating pipelines. The stages are organized into libraries by use case. The current libraries are:
- `morpheus-core`
- `morpheus-dfp`
- `morpheus-llm`

The libraries are hosted as Conda packages on the [`nvidia`](https://anaconda.org/nvidia/) channel.

The split into multiple libraries allows for a more modular approach to using the Morpheus stages. For example, if you are building an application for Digital Finger Printing, you can install just the `morpheus-dfp` library. This reduces the size of the installed package. It also limits the dependencies eliminating unnecessary version conflicts.


## Morpheus Core
The `morpheus-core` library contains the core stages that are common across all use cases. The Morpheus core library is built from the source code in the `python/morpheus` directory of the Morpheus repository. The core library is installed as a dependency when you install any of the other Morpheus libraries.
To set up a Conda environment with the [`morpheus-core`](https://anaconda.org/nvidia/morpheus-core) library you can run the following commands:
### Create a Conda environment
```bash
export CONDA_ENV_NAME=morpheus
conda create -n ${CONDA_ENV_NAME} python=3.10
conda activate ${CONDA_ENV_NAME}
```
### Add Conda channels
These channel are required for installing the runtime dependencies
```bash
conda config --env --add channels conda-forge &&\
conda config --env --add channels nvidia &&\
conda config --env --add channels rapidsai &&\
conda config --env --add channels pytorch
```
### Install the `morpheus-core` library
```bash
conda install -c nvidia morpheus-core
```
The `morpheus-core` Conda package installs the `morpheus` python package. It also pulls down all the necessary Conda runtime dependencies for the core stages including [`mrc`](https://anaconda.org/nvidia/mrc) and [`libmrc`](https://anaconda.org/nvidia/libmrc).
### Install additional PyPI dependencies
Some of the stages in the core library require additional dependencies that are hosted on PyPI. These dependencies are included as a requirements file in the `morpheus` python package. The requirements files can be located and installed by running the following command:
```bash
MORPHEUS_CORE_PKG_DIR=$(dirname $(python -c "import morpheus; print(morpheus.__file__)"))
pip install -r ${MORPHEUS_CORE_PKG_DIR}/requirements_morpheus_core.txt
```

## Morpheus DFP
Digital Finger Printing (DFP) is a technique used to identify anomalous behavior and uncover potential threats in the environment​. The `morpheus-dfp` library contains stages for DFP. It is built from the source code in the `python/morpheus_dfp` directory of the Morpheus repository. To set up a Conda environment with the [`morpheus-dfp`](https://anaconda.org/nvidia/morpheus-dfp) library you can run the following commands:
### Create a Conda environment
```bash
export CONDA_ENV_NAME=morpheus-dfp
conda create -n ${CONDA_ENV_NAME} python=3.10
conda activate ${CONDA_ENV_NAME}
```
### Add Conda channels
These channel are required for installing the runtime dependencies
```bash
conda config --env --add channels conda-forge &&\
conda config --env --add channels nvidia &&\
conda config --env --add channels rapidsai &&\
conda config --env --add channels pytorch
```
### Install the `morpheus-dfp` library
```bash
conda install -c nvidia morpheus-dfp
```
The `morpheus-dfp` Conda package installs the `morpheus_dfp` python package. It also pulls down all the necessary Conda runtime dependencies including [`morpheus-core`](https://anaconda.org/nvidia/morpheus-core).
### Install additional PyPI dependencies
Some of the DFP stages in the library require additional dependencies that are hosted on PyPI. These dependencies are included as a requirements file in the `morpheus_dfp` python package. And can be installed by running the following command:
```bash
MORPHEUS_DFP_PKG_DIR=$(dirname $(python -c "import morpheus_dfp; print(morpheus_dfp.__file__)"))
pip install -r ${MORPHEUS_DFP_PKG_DIR}/requirements_morpheus_dfp.txt
```

## Morpheus LLM
The `morpheus-llm` library contains stages for Large Language Models (LLM) and Vector Databases. These stages are used for setting up Retrieval Augmented Generation (RAG) pipelines. The `morpheus-llm` library is built from the source code in the `python/morpheus_llm` directory of the Morpheus repository.
To set up a Conda environment with the [`morpheus-llm`](https://anaconda.org/nvidia/morpheus-dfp) library you can run the following commands:
### Create a Conda environment
```bash
export CONDA_ENV_NAME=morpheus-llm
conda create -n ${CONDA_ENV_NAME} python=3.10
conda activate ${CONDA_ENV_NAME}
```
### Add Conda channels
These channel are required for installing the runtime dependencies
```bash
conda config --env --add channels conda-forge &&\
conda config --env --add channels nvidia &&\
conda config --env --add channels rapidsai &&\
conda config --env --add channels pytorch
```
### Install the `morpheus-llm` library
```bash
conda install -c nvidia morpheus-llm
```
The `morpheus-llm` Conda package installs the `morpheus_llm` python package. It also pulls down all the necessary Conda packages including [`morpheus-core`](https://anaconda.org/nvidia/morpheus-core).
### Install additional PyPI dependencies
Some of the stages in the library require additional dependencies that are hosted on PyPI. These dependencies are included as a requirements file in the `morpheus_llm` python package. And can be installed by running the following command:
```bash
MORPHEUS_LLM_PKG_DIR=$(dirname $(python -c "import morpheus_llm; print(morpheus_llm.__file__)"))
pip install -r ${MORPHEUS_LLM_PKG_DIR}/requirements_morpheus_llm.txt
```

## Miscellaneous
### Morpheus Examples
The Morpheus examples are not included in the Morpheus Conda packages. To use them you need to clone the Morpheus repository and run the examples from source. For details refer to the [Morpheus Examples](./examples.md).

### Namespace Updates
If you were using a Morpheus release prior to 24.10 you may need to update the namespace for the DFP, LLM and vector database stages.

A script, `scripts/morpheus_namespace_update.py`, has been provide to help with that and can be run as follows:
```bash
python scripts/morpheus_namespace_update.py --directory <directory> --dfp
```
```bash
python scripts/morpheus_namespace_update.py --directory <directory> --llm
```
44 changes: 37 additions & 7 deletions docs/source/developer_guide/contributing.md
Original file line number Diff line number Diff line change
Expand Up @@ -151,7 +151,35 @@ This workflow utilizes a Docker container to set up most dependencies ensuring a

### Build in a Conda Environment

If a Conda environment on the host machine is preferred over Docker, it is relatively easy to install the necessary dependencies (In reality, the Docker workflow creates a Conda environment inside the container).
If a [Conda](https://docs.conda.io/projects/conda/en/latest/) environment on the host machine is preferred over Docker, it is relatively easy to install the necessary dependencies (In reality, the Docker workflow creates a Conda environment inside the container).

#### Conda Environment YAML Files
Morpheus provides multiple Conda environment files to support different workflows. Morpheus utilizes [rapids-dependency-file-generator](https://pypi.org/project/rapids-dependency-file-generator/) to manage these multiple environment files. All of Morpheus' Conda and [pip](https://pip.pypa.io/en/stable/) dependencies along with the different environments are defined in the `dependencies.yaml` file.

The following are the available Conda environment files, all are located in the `conda/environments` directory, with the following naming convention: `<environment>_<cuda_version>_arch-<architecture>.yaml`.
| Environment | File | Description |
| --- | --- | --- |
| `all` | `all_cuda-125_arch-x86_64.yaml` | All dependencies required to build, run and test Morpheus, along with all of the examples. This is a superset of the `dev`, `runtime` and `examples` environments. |
| `dev` | `dev_cuda-125_arch-x86_64.yaml` | Dependencies required to build, run and test Morpheus. This is a superset of the `runtime` environment. |
| `examples` | `examples_cuda-125_arch-x86_64.yaml` | Dependencies required to run all examples. This is a superset of the `runtime` environment. |
| `model-utils` | `model-utils_cuda-125_arch-x86_64.yaml` | Dependencies required to train models independent of Morpheus. |
| `runtime` | `runtime_cuda-125_arch-x86_64.yaml` | Minimal set of dependencies strictly required to run Morpheus. |


##### Updating Morpheus Dependencies
Changes to Morpheus dependencies can be made in the `dependencies.yaml` file, then run `rapids-dependency-file-generator` to update the individual environment files in the `conda/environments` directory .

Install `rapids-dependency-file-generator` into the base Conda environment:
```bash
conda run -n base --live-stream pip install rapids-dependency-file-generator
```

Then to generate update the individual environment files run:
```bash
conda run -n base --live-stream rapids-dependency-file-generator
```

When ready, commit both the changes to the `dependencies.yaml` file and the updated environment files into the repo.

#### Prerequisites

Expand All @@ -170,19 +198,21 @@ If a Conda environment on the host machine is preferred over Docker, it is relat
```bash
git submodule update --init --recursive
```
1. Create the Morpheus Conda environment
1. Create the Morpheus Conda environment using either the `dev` or `all` environment file. Refer to the [Conda Environment YAML Files](#conda-environment-yaml-files) section for more information.
```bash
conda env create --solver=libmamba -n morpheus --file conda/environments/dev_cuda-125_arch-x86_64.yaml
conda activate morpheus
```
or
```bash
conda env create --solver=libmamba -n morpheus --file conda/environments/all_cuda-125_arch-x86_64.yaml

This creates a new environment named `morpheus`, and activates that environment.
```

> **Note**: The `dev_cuda-121_arch-x86_64.yaml` Conda environment file specifies all of the dependencies required to build Morpheus and run Morpheus. However many of the examples, and optional packages such as `morpheus_llm` require additional dependencies. Alternately the following command can be used to create the Conda environment:
This creates a new environment named `morpheus`. Activate the environment with:
```bash
conda env create --solver=libmamba -n morpheus --file conda/environments/all_cuda-121_arch-x86_64.yaml
conda activate morpheus
```

1. Build Morpheus
```bash
./scripts/compile.sh
Expand Down Expand Up @@ -345,7 +375,7 @@ Launching a full production Kafka cluster is outside the scope of this project;

### Pipeline Validation

To verify that all pipelines are working correctly, validation scripts have been added at `${MORPHEUS_ROOT}/scripts/validation`. There are scripts for each of the main workflows: Anomalous Behavior Profiling (ABP), Humans-as-Machines-Machines-as-Humans (HAMMAH), Phishing Detection (Phishing), and Sensitive Information Detection (SID).
To verify that all pipelines are working correctly, validation scripts have been added at `${MORPHEUS_ROOT}/scripts/validation`. There are scripts for each of the main workflows: Anomalous Behavior Profiling (ABP), Phishing Detection (Phishing), and Sensitive Information Detection (SID).

To run all of the validation workflow scripts, use the following commands:

Expand Down
Loading
Loading