Skip to content

Commit

Permalink
Stage documentation improvements (#1362)
Browse files Browse the repository at this point in the history
* Populate stub stages document 
* Excluded from the list are abstract, deprecated and stages primarily used for testing.
* Fix minor docstring type-os in `morpheus/stages/input/rss_source_stage.py` and `morpheus/stages/postprocess/ml_flow_drift_stage.py`

Closes #1361

## By Submitting this PR I confirm:
- I am familiar with the [Contributing Guidelines](https://github.com/nv-morpheus/Morpheus/blob/main/docs/source/developer_guide/contributing.md).
- When the PR is ready for review, new or existing tests cover these changes.
- When the PR is ready for review, the documentation is up to date with these changes.

Authors:
  - David Gardner (https://github.com/dagardner-nv)

Approvers:
  - Devin Robison (https://github.com/drobison00)

URL: #1362
  • Loading branch information
dagardner-nv authored Nov 22, 2023
1 parent afce09f commit 24052ec
Show file tree
Hide file tree
Showing 6 changed files with 67 additions and 47 deletions.
1 change: 1 addition & 0 deletions docs/source/examples/doca/README.md
1 change: 1 addition & 0 deletions docs/source/examples/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@ Examples

abp_nvsmi_detection/README.md
abp_pcap_detection/README.md
doca/README.md
gnn_fraud_detection_pipeline/README.md
llm/index
log_parsing/README.md
Expand Down
4 changes: 3 additions & 1 deletion docs/source/getting_started.md
Original file line number Diff line number Diff line change
Expand Up @@ -272,7 +272,9 @@ This indicates that the `to-file` stage cannot accept the input type of `morpheu
#### Pipeline Stages
A complete list of the pipeline stages will be added in the future. For now, you can query the available stages for each pipeline type via:
The CLI allows for an easy way to query the available stages for each pipeline type. Refer to [Morpheus Stages](./stages/morpheus_stages.md) for more extensive documentation of the stages included in Morpheus.
> **Note**: While most stages are available from the CLI, a small subset of stages, and configuration options for stages are unavailable from the CLI and can only be used via the Python API.
```
$ morpheus run pipeline-nlp --help
Expand Down
104 changes: 60 additions & 44 deletions docs/source/stages/morpheus_stages.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,65 +17,81 @@ limitations under the License.

# Stages Documentation

## Boundary
Stages are the building blocks of Morpheus pipelines. Below is a list of the most commonly used stages. For a full list of stages, refer to the stages API {py:mod}`morpheus.stages`. In addition to this there are several custom stages contained in the [Examples](../examples.md) and [Developer Guides](../developer_guide/guides.md).

- Linear Boundary Stage
## Table of Contents
- [Doca](#doca)
- [General](#general)
- [Inference](#inference)
- [Input](#input)
- [LLM](#llm)
- [Output](#output)
- [Post-process](#post-process)
- [Pre-process](#pre-process)


## Doca

- Doca Stage {py:class}`~morpheus.stages.doca.doca_source_stage.DocaSourceStage` A source stage used to receive raw packet data from a ConnectX-6 Dx NIC. This stage is not compiled by default refer to the [Doca Example](../../../examples/doca/README.md) for details on building this stage.

## General

- Buffer Stage
- Delay Stage
- Linear Modules Stage
- Monitor Stage
- Multi Port Module Stage
- Trigger Stage
- Linear Modules Stage {py:class}`~morpheus.stages.general.linear_modules_stage.LinearModulesStage` Loads an existing, registered, module and wraps it as a Morpheus stage. Refer to [Morpheus Modules](../developer_guide/guides.md#morpheus-modules) for details on modules.
- Monitor Stage {py:class}`~morpheus.stages.general.monitor_stage.MonitorStage` Display throughput numbers at a specific point in the pipeline.
- Multi Port Module Stage {py:class}`~morpheus.stages.general.multi_port_modules_stage.MultiPortModulesStage` Loads an existing, registered, multi-port module and wraps it as a multi-port Morpheus stage. Refer to [Morpheus Modules](../developer_guide/guides.md#morpheus-modules) for details on modules.
- Trigger Stage {py:class}`~morpheus.stages.general.trigger_stage.TriggerStage` Buffer data until the previous stage has completed, useful for testing performance of one stage at a time.

## Inference

- Auto Encoder Inference Stage
- Identity Inference Stage
- Inference Stage
- PyTorch Inference Stage
- Triton Inference Stage
- Auto Encoder Inference Stage {py:class}`~morpheus.stages.inference.auto_encoder_inference_stage.AutoEncoderInferenceStage` PyTorch inference stage used for Auto Encoder pipeline mode.
- PyTorch Inference Stage {py:class}`~morpheus.stages.inference.pytorch_inference_stage.PyTorchInferenceStage` PyTorch inference stage used for most pipeline modes with the exception of Auto Encoder.
- Triton Inference Stage {py:class}`~morpheus.stages.inference.triton_inference_stage.TritonInferenceStage` Inference stage which utilizes a [Triton Inference Server](https://developer.nvidia.com/nvidia-triton-inference-server).

## Input

- AppShield Source Stage
- Autoencoder Source Stage
- Azure Source Stage
- Cloud Trail Source Stage
- Control Message File Source Stage
- Control Message Kafka Source Stage
- Duo Source Stage
- File Source Stage
- Kafka Source Stage
- AppShield Source Stage {py:class}`~morpheus.stages.input.appshield_source_stage.AppShieldSourceStage` Load Appshield messages from one or more plugins into a dataframe.
- Azure Source Stage {py:class}`~morpheus.stages.input.azure_source_stage.AzureSourceStage` Load Azure Active Directory messages.
- Cloud Trail Source Stage {py:class}`~morpheus.stages.input.cloud_trail_source_stage.CloudTrailSourceStage` Load messages from a Cloudtrail directory.
- Control Message File Source Stage {py:class}`~morpheus.stages.input.control_message_file_source_stage.ControlMessageFileSourceStage` Recieves control messages from different sources specified by a list of (fsspec)[https://filesystem-spec.readthedocs.io/en/latest/api.html?highlight=open_files#fsspec.open_files] strings.
- Control Message Kafka Source Stage {py:class}`~morpheus.stages.input.control_message_kafka_source_stage.ControlMessageKafkaSourceStage` Load control messages from a Kafka cluster.
- Databricks Delta Lake Source Stage {py:class}`~morpheus.stages.input.databricks_deltalake_source_stage.DataBricksDeltaLakeSourceStage` Source stage used to load messages from a DeltaLake table.
- Duo Source Stage {py:class}`~morpheus.stages.input.duo_source_stage.DuoSourceStage` Load Duo Authentication messages.
- File Source Stage {py:class}`~morpheus.stages.input.file_source_stage.FileSourceStage` Load messages from a file.
- HTTP Client Source Stage {py:class}`~morpheus.stages.input.http_client_source_stage.HttpClientSourceStage` Poll a remote HTTP server for incoming data.
- HTTP Server Source Stage {py:class}`~morpheus.stages.input.http_server_source_stage.HttpServerSourceStage` Start an HTTP server and listens for incoming requests on a specified endpoint.
- In Memory Source Stage {py:class}`~morpheus.stages.input.in_memory_source_stage.InMemorySourceStage` Input source that emits a pre-defined list of dataframes.
- Kafka Source Stage {py:class}`~morpheus.stages.input.kafka_source_stage.KafkaSourceStage` Load messages from a Kafka cluster.
- RSS Source Stage {py:class}`~morpheus.stages.input.rss_source_stage.RSSSourceStage` Load RSS feed items into a pandas DataFrame.

## Output
## LLM

- Write To File Stage
- Write To Kafka Stage
- LLM Engine Stage {py:class}`~morpheus.stages.llm.llm_engine_stage.LLMEngineStage` Execute an LLM engine within a Morpheus pipeline.

## Output
- HTTP Client Sink Stage {py:class}`~morpheus.stages.output.http_client_sink_stage.HttpClientSinkStage` Write all messages to an HTTP endpoint.
- HTTP Server Sink Stage {py:class}`~morpheus.stages.output.http_server_sink_stage.HttpServerSinkStage` Start an HTTP server and listens for incoming requests on a specified endpoint.
- In Memory Sink Stage {py:class}`~morpheus.stages.output.in_memory_sink_stage.InMemorySinkStage` Collect incoming messages into a list that can be accessed after the pipeline is complete.
- Databricks Delta Lake Sink Stage {py:class}`~morpheus.stages.output.write_to_databricks_deltalake_stage.DataBricksDeltaLakeSinkStage` Write messages to a DeltaLake table.
- Write To Elastic Search Stage {py:class}`~morpheus.stages.output.write_to_elasticsearch_stage.WriteToElasticsearchStage` Write the messages as documents to Elasticsearch.
- Write To File Stage {py:class}`~morpheus.stages.output.write_to_file_stage.WriteToFileStage` Write all messages to a file.
- Write To Kafka Stage {py:class}`~morpheus.stages.output.write_to_kafka_stage.WriteToKafkaStage` Write all messages to a Kafka cluster.
- Write To Vector DB Stage {py:class}`~morpheus.stages.output.write_to_vector_db.WriteToVectorDBStage` Write all messages to a Vector Database.

## Post-process

- Add Classifications Stage
- Add Scores Stage
- Filter Detections Stage
- Generate Viz Frames Stage
- ML Flow Drift Stage
- Serialize Stage
- Timeseries Stage
- Validation Stage
- Add Classifications Stage {py:class}`~morpheus.stages.postprocess.add_classifications_stage.AddClassificationsStage` Add detected classifications to each message.
- Add Scores Stage {py:class}`~morpheus.stages.postprocess.add_scores_stage.AddScoresStage` Add probability scores to each message.
- Filter Detections Stage {py:class}`~morpheus.stages.postprocess.filter_detections_stage.FilterDetectionsStage` Filter message by a classification threshold.
- Generate Viz Frames Stage {py:class}`~morpheus.stages.postprocess.generate_viz_frames_stage.GenerateVizFramesStage` Write out visualization DataFrames.
- MLflow Drift Stage {py:class}`~morpheus.stages.postprocess.ml_flow_drift_stage.MLFlowDriftStage` Report model drift statistics to MLflow.
- Serialize Stage {py:class}`~morpheus.stages.postprocess.serialize_stage.SerializeStage` Include & exclude columns from messages.
- Timeseries Stage {py:class}`~morpheus.stages.postprocess.timeseries_stage.TimeSeriesStage` Perform time series anomaly detection and add prediction.

## Pre-process

- Deserialize Stage
- Drop Null Stage
- Preprocess AE Stage
- Preprocess Base Stage
- Preprocess FIL Stage
- Preprocess NLP Stage
- Train AE Stage

## Training

- Training Stage
- Deserialize Stage {py:class}`~morpheus.stages.preprocess.deserialize_stage.DeserializeStage` Partition messages based on the pipeline config's `pipeline_batch_size` parameter.
- Drop Null Stage {py:class}`~morpheus.stages.preprocess.drop_null_stage.DropNullStage` Drop null data entries from a DataFrame.
- Preprocess AE Stage {py:class}`~morpheus.stages.preprocess.preprocess_ae_stage.PreprocessAEStage` Prepare Autoencoder input DataFrames for inference.
- Preprocess FIL Stage {py:class}`~morpheus.stages.preprocess.preprocess_fil_stage.PreprocessFILStage` Prepare FIL input DataFrames for inference.
- Preprocess NLP Stage {py:class}`~morpheus.stages.preprocess.preprocess_nlp_stage.PreprocessNLPStage` Prepare NLP input DataFrames for inference.
- Train AE Stage {py:class}`~morpheus.stages.preprocess.train_ae_stage.TrainAEStage` Train an Autoencoder model on incoming data.
2 changes: 1 addition & 1 deletion morpheus/stages/input/rss_source_stage.py
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@
@register_stage("from-rss")
class RSSSourceStage(PreallocatorMixin, SingleOutputSource):
"""
Load RSS feed items into a pandas DataFrame.
Load RSS feed items into a DataFrame.
Parameters
----------
Expand Down
2 changes: 1 addition & 1 deletion morpheus/stages/postprocess/ml_flow_drift_stage.py
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@
@register_stage("mlflow-drift", modes=[PipelineModes.FIL, PipelineModes.NLP, PipelineModes.OTHER])
class MLFlowDriftStage(PassThruTypeMixin, SinglePortStage):
"""
Report model drift statistics to ML Flow.
Report model drift statistics to MLflow.
Caculates model drift over time and reports the information to MLflow.
Expand Down

0 comments on commit 24052ec

Please sign in to comment.