Skip to content

Commit

Permalink
Update examples to execute from the root of the repo (#1674)
Browse files Browse the repository at this point in the history
* Update top-level examples and Triton start-up commands to execute from the root of the repo
* Where possible set default values for cli flags, removing the need to set them for the common use-case
* Where possible remove the need for defining `MORPHEUS_ROOT`
* Ensure C++ Triton pipelines use port 8000 to avoid the warning about the grpc port.
* Optionally cast types in the C++ impl of the Triton stage when `force_convert_inputs=true` and the input and model types didn't match (previously types were always casted)
* Remove `--num_threads=1` restriction and configure logging for the `log_parsing` example
* Remove `--num_threads=8` restriction from `nlp_si_detection` since the pipeline has more than 8 stages.
* Don't invoke the C++ impl of preallocate if the type being requested isn't supported on the C++ side (strings)
* Don't use the C++ impl of the Triton stage if `use_shared_memory` is requested as this isn't supported in C++.
* Add missing `gnn-fraud-classification` stage to CLI alternative for `gnn_fraud_detection_pipeline` example

Closes #1671

## By Submitting this PR I confirm:
- I am familiar with the [Contributing Guidelines](https://github.com/nv-morpheus/Morpheus/blob/main/docs/source/developer_guide/contributing.md).
- When the PR is ready for review, new or existing tests cover these changes.
- When the PR is ready for review, the documentation is up to date with these changes.

Authors:
  - David Gardner (https://github.com/dagardner-nv)

Approvers:
  - Eli Fajardo (https://github.com/efajardo-nv)
  - Michael Demoret (https://github.com/mdemoret-nv)

URL: #1674
  • Loading branch information
dagardner-nv authored May 2, 2024
1 parent 9d3de8a commit 808c52c
Show file tree
Hide file tree
Showing 28 changed files with 448 additions and 144 deletions.
2 changes: 1 addition & 1 deletion docs/source/examples.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ limitations under the License.
* [Example Ransomware Detection Morpheus Pipeline for AppShield Data](../../examples/ransomware_detection/README.md)
* [Root Cause Analysis Acceleration & Predictive Maintenance Example](../../examples/root_cause_analysis/README.md)
* [SID Visualization Example](../../examples/sid_visualization/README.md)
* [Large Language Models (LLMs)](../../examples/llm/README.md)
* Large Language Models (LLMs)
* [Agents](../../examples/llm/agents/README.md)
* [Completion](../../examples/llm/completion/README.md)
* [VDB Upload](../../examples/llm/vdb_upload/README.md)
Expand Down
22 changes: 15 additions & 7 deletions examples/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,10 +15,18 @@ See the License for the specific language governing permissions and
limitations under the License.
-->

## Morpheus CLI Examples

Examples run with the Morpheus CLI (`morpheus ...`) should be run from the repository root; otherwise, some filepath arguments may need to be changed.

## Morpheus run.py Examples

Examples run with python (`python run.py`) should be run from the example's directory; otherwise, relative Python imports may be broken.
# Examples
* [Anomalous Behavior Profiling with Forest Inference Library (FIL) Example](./abp_nvsmi_detection/README.md)
* [ABP Detection Example Using Morpheus](./abp_pcap_detection/README.md)
* [Digital Fingerprinting (DFP)](./digital_fingerprinting/README.md)
* [GNN Fraud Detection Pipeline](./gnn_fraud_detection_pipeline/README.md)
* [Example cyBERT Morpheus Pipeline for Apache Log Parsing](./log_parsing/README.md)
* [Sensitive Information Detection with Natural Language Processing (NLP) Example](./nlp_si_detection/README.md)
* [Example Ransomware Detection Morpheus Pipeline for AppShield Data](./ransomware_detection/README.md)
* [Root Cause Analysis Acceleration & Predictive Maintenance Example](./root_cause_analysis/README.md)
* [SID Visualization Example](./sid_visualization/README.md)
* Large Language Models (LLMs)
* [Agents](./llm/agents/README.md)
* [Completion](./llm/completion/README.md)
* [VDB Upload](./llm/vdb_upload/README.md)
* [Retreival Augmented Generation (RAG)](./llm/rag/README.md)
41 changes: 15 additions & 26 deletions examples/abp_pcap_detection/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,14 +27,9 @@ docker pull nvcr.io/nvidia/tritonserver:23.06-py3
```

##### Deploy Triton Inference Server
From the root of the Morpheus repo, navigate to the anomalous behavior profiling example directory:
From the root of the Morpheus repo, run the following to launch Triton and load the `abp-pcap-xgb` model:
```bash
cd examples/abp_pcap_detection
```

The following creates the Triton container, mounts the `abp-pcap-xgb` directory to `/models/abp-pcap-xgb` in the Triton container, and starts the Triton server:
```bash
docker run --rm --gpus=all -p 8000:8000 -p 8001:8001 -p 8002:8002 -v $PWD/abp-pcap-xgb:/models/abp-pcap-xgb --name tritonserver nvcr.io/nvidia/tritonserver:23.06-py3 tritonserver --model-repository=/models --exit-on-error=false
docker run --rm --gpus=all -p 8000:8000 -p 8001:8001 -p 8002:8002 -v $PWD/examples/abp_pcap_detection/abp-pcap-xgb:/models/abp-pcap-xgb --name tritonserver nvcr.io/nvidia/tritonserver:23.06-py3 tritonserver --model-repository=/models --exit-on-error=false
```

##### Verify Model Deployment
Expand All @@ -53,53 +48,49 @@ Use Morpheus to run the Anomalous Behavior Profiling Detection Pipeline with the

From the root of the Morpheus repo, run:
```bash
cd examples/abp_pcap_detection
python run.py --help
python examples/abp_pcap_detection/run.py --help
```

Output:
```
Usage: run.py [OPTIONS]
Options:
--num_threads INTEGER RANGE Number of internal pipeline threads to use
--num_threads INTEGER RANGE Number of internal pipeline threads to use.
[x>=1]
--pipeline_batch_size INTEGER RANGE
Internal batch size for the pipeline. Can be
much larger than the model batch size. Also
used for Kafka consumers [x>=1]
used for Kafka consumers. [x>=1]
--model_max_batch_size INTEGER RANGE
Max batch size to use for the model [x>=1]
--input_file PATH Input filepath [required]
Max batch size to use for the model. [x>=1]
--input_file PATH Input filepath. [required]
--output_file TEXT The path to the file where the inference
output will be saved.
--model_fea_length INTEGER RANGE
Features length to use for the model [x>=1]
Features length to use for the model.
[x>=1]
--model_name TEXT The name of the model that is deployed on
Tritonserver
Tritonserver.
--iterative Iterative mode will emit dataframes one at a
time. Otherwise a list of dataframes is
emitted. Iterative mode is good for
interleaving source stages.
--server_url TEXT Tritonserver url [required]
--file_type [auto|json|csv] Indicates what type of file to read.
--server_url TEXT Tritonserver url. [required]
--file_type [auto|csv|json] Indicates what type of file to read.
Specifying 'auto' will determine the file
type from the extension.
--help Show this message and exit.
```

To launch the configured Morpheus pipeline with the sample data that is provided in `examples/data`, from the `examples/abp_pcap_detection` directory run the following:
To launch the configured Morpheus pipeline with the sample data that is provided in `examples/data`, run the following:

```bash
python run.py \
--input_file ../data/abp_pcap_dump.jsonlines \
--output_file ./pcap_out.jsonlines \
--model_name 'abp-pcap-xgb' \
--server_url localhost:8001
python examples/abp_pcap_detection/run.py
```
Note: Both Morpheus and Triton Inference Server containers must have access to the same GPUs in order for this example to work.

The pipeline will process the input `pcap_dump.jsonlines` sample data and write it to `pcap_out.jsonlines`.
The pipeline will process the input `abp_pcap_dump.jsonlines` sample data and write it to `pcap_out.jsonlines`.

### CLI Example
The above example is illustrative of using the Python API to build a custom Morpheus Pipeline.
Expand All @@ -123,5 +114,3 @@ morpheus --log_level INFO --plugin "examples/abp_pcap_detection/abp_pcap_preproc
to-file --filename "pcap_out.jsonlines" --overwrite \
monitor --description "Write to file rate" --unit "to-file"
```

Note: Triton is still needed to be launched from the `examples/abp_pcap_detection` directory.
7 changes: 5 additions & 2 deletions examples/abp_pcap_detection/run.py
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,9 @@
from morpheus.stages.preprocess.deserialize_stage import DeserializeStage
from morpheus.utils.logger import configure_logging

CUR_DIR = os.path.dirname(__file__)
EX_DATA_DIR = os.path.join(CUR_DIR, "../data")


@click.command()
@click.option(
Expand All @@ -57,7 +60,7 @@
@click.option(
"--input_file",
type=click.Path(exists=True, readable=True),
default="pcap.jsonlines",
default=os.path.join(EX_DATA_DIR, "abp_pcap_dump.jsonlines"),
required=True,
help="Input filepath.",
)
Expand All @@ -84,7 +87,7 @@
help=("Iterative mode will emit dataframes one at a time. Otherwise a list of dataframes is emitted. "
"Iterative mode is good for interleaving source stages."),
)
@click.option("--server_url", required=True, help="Tritonserver url.")
@click.option("--server_url", required=True, help="Tritonserver url.", default="localhost:8001")
@click.option(
"--file_type",
type=click.Choice(FILE_TYPE_NAMES, case_sensitive=False),
Expand Down
15 changes: 4 additions & 11 deletions examples/gnn_fraud_detection_pipeline/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,17 +28,10 @@ mamba env update \
```

## Running

##### Setup Env Variable
```bash
export MORPHEUS_ROOT=$(pwd)
```

Use Morpheus to run the GNN fraud detection Pipeline with the transaction data. A pipeline has been configured in `run.py` with several command line options:

```bash
cd ${MORPHEUS_ROOT}/examples/gnn_fraud_detection_pipeline
python run.py --help
python examples/gnn_fraud_detection_pipeline/run.py --help
```
```
Usage: run.py [OPTIONS]
Expand All @@ -63,11 +56,10 @@ Options:
--help Show this message and exit.
```

To launch the configured Morpheus pipeline with the sample data that is provided at `$MORPHEUS_ROOT/models/dataset`, run the following:
To launch the configured Morpheus pipeline, run the following:

```bash
cd ${MORPHEUS_ROOT}/examples/gnn_fraud_detection_pipeline
python run.py
python examples/gnn_fraud_detection_pipeline/run.py
```
```
====Registering Pipeline====
Expand Down Expand Up @@ -125,6 +117,7 @@ morpheus --log_level INFO \
monitor --description "Graph construction rate" \
gnn-fraud-sage --model_dir examples/gnn_fraud_detection_pipeline/model/ \
monitor --description "Inference rate" \
gnn-fraud-classification --model_xgb_file examples/gnn_fraud_detection_pipeline/model/xgb.pt \
monitor --description "Add classification rate" \
serialize \
to-file --filename "output.csv" --overwrite
Expand Down
8 changes: 5 additions & 3 deletions examples/gnn_fraud_detection_pipeline/run.py
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,8 @@
from stages.graph_construction_stage import FraudGraphConstructionStage
from stages.graph_sage_stage import GraphSAGEStage

CUR_DIR = os.path.dirname(__file__)


@click.command()
@click.option(
Expand Down Expand Up @@ -62,21 +64,21 @@
@click.option(
"--input_file",
type=click.Path(exists=True, readable=True, dir_okay=False),
default="validation.csv",
default=os.path.join(CUR_DIR, "validation.csv"),
required=True,
help="Input data filepath.",
)
@click.option(
"--training_file",
type=click.Path(exists=True, readable=True, dir_okay=False),
default="training.csv",
default=os.path.join(CUR_DIR, "training.csv"),
required=True,
help="Training data filepath.",
)
@click.option(
"--model_dir",
type=click.Path(exists=True, readable=True, file_okay=False, dir_okay=True),
default="model",
default=os.path.join(CUR_DIR, "model"),
required=True,
help="Path to trained Hinsage & XGB models.",
)
Expand Down
21 changes: 6 additions & 15 deletions examples/log_parsing/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,11 +29,6 @@ Example:
docker pull nvcr.io/nvidia/tritonserver:23.06-py3
```

##### Setup Env Variable
```bash
export MORPHEUS_ROOT=$(pwd)
```

##### Start Triton Inference Server Container
From the Morpheus repo root directory, run the following to launch Triton and load the `log-parsing-onnx` model:

Expand All @@ -56,19 +51,15 @@ Once Triton server finishes starting up, it will display the status of all loade
### Run Log Parsing Pipeline

Run the following from the `examples/log_parsing` directory to start the log parsing pipeline:
Run the following from the root of the Morpheus repo to start the log parsing pipeline:

```bash
python run.py \
--num_threads 1 \
--input_file ${MORPHEUS_ROOT}/models/datasets/validation-data/log-parsing-validation-data-input.csv \
--output_file ./log-parsing-output.jsonlines \
python examples/log_parsing/run.py \
--input_file=./models/datasets/validation-data/log-parsing-validation-data-input.csv \
--model_vocab_hash_file=data/bert-base-cased-hash.txt \
--model_vocab_file=${MORPHEUS_ROOT}/models/training-tuning-scripts/sid-models/resources/bert-base-cased-vocab.txt \
--model_seq_length=256 \
--model_vocab_file=./models/training-tuning-scripts/sid-models/resources/bert-base-cased-vocab.txt \
--model_name log-parsing-onnx \
--model_config_file=${MORPHEUS_ROOT}/models/log-parsing-models/log-parsing-config-20220418.json \
--server_url localhost:8001
--model_config_file=./models/log-parsing-models/log-parsing-config-20220418.json
```

Use `--help` to display information about the command line options:
Expand Down Expand Up @@ -110,7 +101,7 @@ PYTHONPATH="examples/log_parsing" \
morpheus --log_level INFO \
--plugin "inference" \
--plugin "postprocessing" \
run --num_threads 1 --pipeline_batch_size 1024 --model_max_batch_size 32 \
run --pipeline_batch_size 1024 --model_max_batch_size 32 \
pipeline-nlp \
from-file --filename ./models/datasets/validation-data/log-parsing-validation-data-input.csv \
deserialize \
Expand Down
8 changes: 7 additions & 1 deletion examples/log_parsing/run.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@
# See the License for the specific language governing permissions and
# limitations under the License.

import logging
import os

import click
Expand All @@ -28,6 +29,7 @@
from morpheus.stages.output.write_to_file_stage import WriteToFileStage
from morpheus.stages.preprocess.deserialize_stage import DeserializeStage
from morpheus.stages.preprocess.preprocess_nlp_stage import PreprocessNLPStage
from morpheus.utils.logger import configure_logging


@click.command()
Expand Down Expand Up @@ -79,7 +81,7 @@
help="The name of the model that is deployed on Tritonserver.",
)
@click.option("--model_config_file", required=True, help="Model config file.")
@click.option("--server_url", required=True, help="Tritonserver url.")
@click.option("--server_url", required=True, help="Tritonserver url.", default="localhost:8001")
def run_pipeline(
num_threads,
pipeline_batch_size,
Expand All @@ -93,6 +95,10 @@ def run_pipeline(
model_config_file,
server_url,
):

# Enable the default logger.
configure_logging(log_level=logging.INFO)

config = Config()
config.mode = PipelineModes.NLP
config.num_threads = num_threads
Expand Down
5 changes: 2 additions & 3 deletions examples/nlp_si_detection/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -103,11 +103,10 @@ The following command line is the entire command to build and launch the pipelin

From the Morpheus repo root directory, run:
```bash
export MORPHEUS_ROOT=$(pwd)
# Launch Morpheus printing debug messages
morpheus --log_level=DEBUG \
`# Run a pipeline with 8 threads and a model batch size of 32 (Must match Triton config)` \
run --num_threads=8 --pipeline_batch_size=1024 --model_max_batch_size=32 \
`# Run a pipeline with a model batch size of 32 (Must match Triton config)` \
run --pipeline_batch_size=1024 --model_max_batch_size=32 \
`# Specify a NLP pipeline with 256 sequence length (Must match Triton config)` \
pipeline-nlp --model_seq_length=256 \
`# 1st Stage: Read from file` \
Expand Down
2 changes: 1 addition & 1 deletion examples/nlp_si_detection/run.sh
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ SCRIPT_DIR=${SCRIPT_DIR:-"$( cd "$( dirname "${BASH_SOURCE[0]}" )" &> /dev/null
export MORPHEUS_ROOT=${MORPHEUS_ROOT:-"$(realpath ${SCRIPT_DIR}/../..)"}

morpheus --log_level=DEBUG \
run --num_threads=8 --pipeline_batch_size=1024 --model_max_batch_size=32 \
run --pipeline_batch_size=1024 --model_max_batch_size=32 \
pipeline-nlp --model_seq_length=256 \
from-file --filename=${MORPHEUS_ROOT}/examples/data/pcap_dump.jsonlines \
deserialize \
Expand Down
21 changes: 10 additions & 11 deletions examples/ransomware_detection/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,15 +35,15 @@ export MORPHEUS_ROOT=$(pwd)
```

##### Start Triton Inference Server Container
Run the following from the `examples/ransomware_detection` directory to launch Triton and load the `ransomw-model-short-rf` model:

From the Morpheus repo root directory, run the following to launch Triton and load the `ransomw-model-short-rf` model:
```bash
# Run Triton in explicit mode
docker run --rm -ti --gpus=all -p8000:8000 -p8001:8001 -p8002:8002 -v $PWD/models:/models/triton-model-repo nvcr.io/nvidia/tritonserver:23.06-py3 \
tritonserver --model-repository=/models/triton-model-repo \
--exit-on-error=false \
--model-control-mode=explicit \
--load-model ransomw-model-short-rf
docker run --rm -ti --gpus=all -p8000:8000 -p8001:8001 -p8002:8002 \
-v $PWD/examples/ransomware_detection/models:/models/triton-model-repo nvcr.io/nvidia/tritonserver:23.06-py3 \
tritonserver --model-repository=/models/triton-model-repo \
--exit-on-error=false \
--model-control-mode=explicit \
--load-model ransomw-model-short-rf
```

##### Verify Model Deployment
Expand All @@ -67,14 +67,13 @@ mamba install 'dask>=2023.1.1' 'distributed>=2023.1.1'
```

## Run Ransomware Detection Pipeline
Run the following from the `examples/ransomware_detection` directory to start the ransomware detection pipeline:
Run the following from the root of the Morpheus repo to start the ransomware detection pipeline:

```bash
python run.py --server_url=localhost:8001 \
python examples/ransomware_detection/run.py --server_url=localhost:8001 \
--sliding_window=3 \
--model_name=ransomw-model-short-rf \
--conf_file=./config/ransomware_detection.yaml \
--input_glob=${MORPHEUS_ROOT}/examples/data/appshield/*/snapshot-*/*.json \
--input_glob=./examples/data/appshield/*/snapshot-*/*.json \
--output_file=./ransomware_detection_output.jsonlines
```

Expand Down
4 changes: 3 additions & 1 deletion examples/ransomware_detection/run.py
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,8 @@
from stages.create_features import CreateFeaturesRWStage
from stages.preprocessing import PreprocessingRWStage

CUR_DIR = os.path.dirname(__file__)


@click.command()
@click.option('--debug', default=False)
Expand Down Expand Up @@ -64,7 +66,7 @@
@click.option(
"--conf_file",
type=click.STRING,
default="./config/ransomware_detection.yaml",
default=os.path.join(CUR_DIR, "config/ransomware_detection.yaml"),
help="Ransomware detection configuration filepath.",
)
@click.option(
Expand Down
Loading

0 comments on commit 808c52c

Please sign in to comment.