From fefa1cc5319217daf058e6bcd4a4ca2850cdba33 Mon Sep 17 00:00:00 2001
From: HesAnEasyCoder <105108698+HesAnEasyCoder@users.noreply.github.com>
Date: Thu, 2 May 2024 22:46:06 -0700
Subject: [PATCH 1/7] Update dfp-model-card.md
Updating to fields and presentation of fields for Model Card++ 3.0 Release.
---
models/model-cards/dfp-model-card.md | 56 ++++------------------------
1 file changed, 7 insertions(+), 49 deletions(-)
diff --git a/models/model-cards/dfp-model-card.md b/models/model-cards/dfp-model-card.md
index 420ceabfe0..d3d3a381f0 100644
--- a/models/model-cards/dfp-model-card.md
+++ b/models/model-cards/dfp-model-card.md
@@ -106,36 +106,6 @@ The evaluation dataset consists of AWS CloudTrail logs. It contains logs from tw
## Model Card ++ Bias Subcard
-### What is the gender balance of the model validation data?
-* Not Applicable
-
-### What is the racial/ethnicity balance of the model validation data?
-* Not Applicable
-
-### What is the age balance of the model validation data?
-* Not Applicable
-
-### What is the language balance of the model validation data?
-* English (cloudtrail logs): 100%
-
-### What is the geographic origin language balance of the model validation data?
-* Not Applicable
-
-### What is the educational background balance of the model validation data?
-* Not Applicable
-
-### What is the accent balance of the model validation data?
-* Not Applicable
-
-### What is the face/key point balance of the model validation data?
-* Not Applicable
-
-### What is the skin/tone balance of the model validation data?
-* Not Applicable
-
-### What is the religion balance of the model validation data?
-* Not Applicable
-
### Individuals from the following adversely impacted (protected classes) groups participate in model design and testing.
* Not Applicable
@@ -147,7 +117,7 @@ The evaluation dataset consists of AWS CloudTrail logs. It contains logs from tw
### Name example applications and use cases for this model.
* The model is primarily designed for testing purposes and serves as a small pretrained model specifically used to evaluate and validate the DFP pipeline. Its application is focused on assessing the effectiveness of the pipeline rather than being intended for broader use cases or specific applications beyond testing.
-### Fill in the blank for the model technique.
+### Intended Users.
* This model is designed for developers seeking to test the DFP pipeline with a small pretrained model trained on a synthetic dataset.
### Name who is intended to benefit from this model.
@@ -157,16 +127,16 @@ The evaluation dataset consists of AWS CloudTrail logs. It contains logs from tw
* The model calculates an anomaly score for each input based on the reconstruction loss obtained from the trained Autoencoder. This score represents the level of anomaly detected in the input data. Higher scores indicate a higher likelihood of anomalous behavior.
* The model provides the reconstruction loss of each feature to facilitate further testing and debugging of the pipeline.
-### List the steps explaining how this model works.
+### Describe how this model works.
* The model works by training on baseline behaviors and subsequently detecting deviations from the established baseline, triggering alerts accordingly.
* [Training notebook](https://github.com/nv-morpheus/Morpheus/blob/branch-24.06/models/training-tuning-scripts/dfp-models/hammah-20211017.ipynb)
-### Name the adversely impacted groups (protected classes) this has been tested to deliver comparable outcomes regardless of:
-* Not Applicable
-
### List the technical limitations of the model.
* The model expects cloudtrail logs with specific features that match the training dataset. Data lacking the required features or requiring a different feature set may not be compatible with the model.
+### Has this been verified to have met prescribed NVIDIA quality standards?
+* Yes
+
### What performance metrics were used to affirm the model's performance?
* The model's performance was evaluated based on its ability to correctly identify anomalous behavior in the synthetic dataset during testing.
@@ -181,10 +151,7 @@ The evaluation dataset consists of AWS CloudTrail logs. It contains logs from tw
### Link the location of the training dataset's repository (if able to share).
* https://github.com/nv-morpheus/Morpheus/tree/branch-24.06/models/datasets/training-data/cloudtrail
-### Is the model used in an application with physical safety impact?
-* No
-
-### Describe physical safety impact (if present).
+### Describe the life critical impact (if present).
* None
### Was model and dataset assessed for vulnerability for potential form of attack?
@@ -196,12 +163,6 @@ The evaluation dataset consists of AWS CloudTrail logs. It contains logs from tw
### Name use case restrictions for the model.
* The model's use case is restricted to testing the Morpheus pipeline and may not be suitable for other applications.
-### Has this been verified to have met prescribed quality standards?
-* No
-
-### Name target quality Key Performance Indicators (KPIs) for which this has been tested.
-* None
-
### Is the model and dataset compliant with National Classification Management Society (NCMS)?
* No
@@ -236,10 +197,7 @@ The evaluation dataset consists of AWS CloudTrail logs. It contains logs from tw
### If PII collected for the development of this AI model, was it minimized to only what was required?
* Not Applicable (no PII collected)
-### Is data in dataset traceable?
-* No
-
-### Are we able to identify and trace source of dataset?
+### Is there data provenance?
* Yes ([fully synthetic dataset](https://github.com/nv-morpheus/Morpheus/tree/branch-24.06/models/datasets/training-data/cloudtrail))
### Does data labeling (annotation, metadata) comply with privacy laws?
From 33599b8077424ae8ddce03d228fb74dfb93cc58e Mon Sep 17 00:00:00 2001
From: HesAnEasyCoder <105108698+HesAnEasyCoder@users.noreply.github.com>
Date: Thu, 2 May 2024 22:46:17 -0700
Subject: [PATCH 2/7] Update root-cause-analysis-model-card.md
Updating to fields and presentation of fields for Model Card++ 3.0 Release.
---
.../root-cause-analysis-model-card.md | 80 ++-----------------
1 file changed, 8 insertions(+), 72 deletions(-)
diff --git a/models/model-cards/root-cause-analysis-model-card.md b/models/model-cards/root-cause-analysis-model-card.md
index 0f6a332f52..bd8c301faf 100644
--- a/models/model-cards/root-cause-analysis-model-card.md
+++ b/models/model-cards/root-cause-analysis-model-card.md
@@ -21,63 +21,49 @@ limitations under the License.
# Model Overview
## Description:
-
* Root cause analysis is a binary classifier differentiating between ordinary logs and errors/problems/root causes in the log files.
## References(s):
-
* Devlin J. et al. (2018), BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding https://arxiv.org/abs/1810.04805
## Model Architecture:
-
**Architecture Type:**
-
* Transformers
**Network Architecture:**
-
* BERT
## Input: (Enter "None" As Needed)
**Input Format:**
-
* CSV
**Input Parameters:**
-
* kern.log file contents
**Other Properties Related to Output:**
-
* N/A
## Output: (Enter "None" As Needed)
**Output Format:**
-
* Binary Results, Root Cause or Ordinary
**Output Parameters:**
-
* N/A
**Other Properties Related to Output:**
-
* N/A
## Software Integration:
**Runtime(s):**
-
* Morpheus
**Supported Hardware Platform(s):**
-
* Ampere/Turing
**Supported Operating System(s):**
-
* Linux
## Model Version(s):
@@ -88,67 +74,31 @@ limitations under the License.
## Training Dataset:
**Link:**
-
* https://github.com/nv-morpheus/Morpheus/blob/branch-24.06/models/datasets/training-data/root-cause-training-data.csv
**Properties (Quantity, Dataset Descriptions, Sensor(s)):**
-
* kern.log files from DGX machines
## Evaluation Dataset:
**Link:**
-
* https://github.com/nv-morpheus/Morpheus/blob/branch-24.06/models/datasets/validation-data/root-cause-validation-data-input.jsonlines
**Properties (Quantity, Dataset Descriptions, Sensor(s)):**
-
* kern.log files from DGX machines
## Inference:
**Engine:**
-
* Triton
**Test Hardware:**
-
* Other
# Subcards
## Model Card ++ Bias Subcard
-### What is the gender balance of the model validation data?
-* Not Applicable
-
-### What is the racial/ethnicity balance of the model validation data?
-* Not Applicable
-
-### What is the age balance of the model validation data?
-* Not Applicable
-
-### What is the language balance of the model validation data?
-* Not Applicable
-
-### What is the geographic origin language balance of the model validation data?
-* Not Applicable
-
-### What is the educational background balance of the model validation data?
-* Not Applicable
-
-### What is the accent balance of the model validation data?
-* Not Applicable
-
-### What is the face/key point balance of the model validation data?
-* Not Applicable
-
-### What is the skin/tone balance of the model validation data?
-* Not Applicable
-
-### What is the religion balance of the model validation data?
-* Not Applicable
-
### Individuals from the following adversely impacted (protected classes) groups participate in model design and testing.
* Not Applicable
@@ -160,26 +110,24 @@ limitations under the License.
### Name example applications and use cases for this model.
* The model is primarily designed for testing purposes and serves as a small pre-trained model specifically used to evaluate and validate the Root Cause Analysis pipeline. This model is an example of customized transformer-based root cause analysis. It can be used for pipeline testing purposes. It needs to be re-trained for specific root cause analysis or predictive maintenance needs with the fine-tuning scripts in the repo. The hyperparameters can be optimised to adjust to get the best results with another dataset. The aim is to get the model to predict some false positives that could be previously unknown error types. Users can use this root cause analysis approach with other log types too. If they have known failures in their logs, they can use them to train along with ordinary logs and can detect other root causes they weren't aware of before.
-### Fill in the blank for the model technique.
-
+### Intended Users.
* This model is designed for developers seeking to test the root cause analysis pipeline with a small pre-trained model trained on a very small `kern.log` file from a DGX.
### Name who is intended to benefit from this model.
-
* The intended beneficiaries of this model are developers who aim to test the functionality of the DFP pipeline using synthetic datasets
### Describe the model output.
* This model output can be used as a binary result, Root cause or Ordinary
-### List the steps explaining how this model works.
+### Describe how this model works.
* A BERT model gets fine-tuned with the kern.log dataset and in the inference it predicts one of the binary classes. Root cause or Ordinary.
-### Name the adversely impacted groups (protected classes) this has been tested to deliver comparable outcomes regardless of:
-* Not Applicable
-
### List the technical limitations of the model.
* For different log types and content, different models need to be trained.
+### Has this been verified to have met prescribed NVIDIA quality standards?
+* Yes
+
### What performance metrics were used to affirm the model's performance?
* F1
@@ -195,10 +143,7 @@ limitations under the License.
### Link the location of the training dataset's repository.
* https://github.com/nv-morpheus/Morpheus/blob/branch-24.06/models/datasets/training-data/root-cause-training-data.csv
-### Is the model used in an application with physical safety impact?
-* No
-
-### Describe physical safety impact (if present).
+### Describe the life critical impact (if present).
* None
### Was model and dataset assessed for vulnerability for potential form of attack?
@@ -210,12 +155,6 @@ limitations under the License.
### Name use case restrictions for the model.
* Different models need to be trained depending on the log types.
-### Has this been verified to have met prescribed quality standards?
-* No
-
-### Name target quality Key Performance Indicators (KPIs) for which this has been tested.
-* N/A
-
### Is the model and dataset compliant with National Classification Management Society (NCMS)?
* No
@@ -232,7 +171,7 @@ limitations under the License.
### Generatable or reverse engineerable personally-identifiable information (PII)?
-* Neither
+* None
### Was consent obtained for any PII used?
* N/A
@@ -249,12 +188,9 @@ limitations under the License.
### If PII collected for the development of this AI model, was it minimized to only what was required?
* N/A
-### Is data in dataset traceable?
+### Is there data provenance?
* Original raw logs are not saved. The small sample in the repo is saved for testing the pipeline.
-### Are we able to identify and trace source of dataset?
-* N/A
-
### Does data labeling (annotation, metadata) comply with privacy laws?
* N/A
From eb8036f0577ec8aa4aa5579fd886162b7b4624bc Mon Sep 17 00:00:00 2001
From: HesAnEasyCoder <105108698+HesAnEasyCoder@users.noreply.github.com>
Date: Mon, 6 May 2024 16:45:11 -0700
Subject: [PATCH 3/7] Update abp-model-card.md
Adding "### Describe access restrictions
* The Principle of least privilege (PoLP) is applied limiting access for dataset generation and model development. Restrictions enforce dataset access during training, and dataset license constraints adhered to."
---
models/model-cards/abp-model-card.md | 8 ++------
1 file changed, 2 insertions(+), 6 deletions(-)
diff --git a/models/model-cards/abp-model-card.md b/models/model-cards/abp-model-card.md
index 3f9043db86..f7e49eed37 100644
--- a/models/model-cards/abp-model-card.md
+++ b/models/model-cards/abp-model-card.md
@@ -210,13 +210,9 @@ NVIDIA believes Trustworthy AI is a shared responsibility and we have establishe
* No
-### Are there explicit model and dataset restrictions?
+### Describe access restrictions
-* No
-
-### Are there access restrictions to systems, model, and data?
-
-* No
+* The Principle of least privilege (PoLP) is applied limiting access for dataset generation and model development. Restrictions enforce dataset access during training, and dataset license constraints adhered to.
### Is there a digital signature?
From c010017b8a217ca6738704221100bc61cbd5fcf3 Mon Sep 17 00:00:00 2001
From: HesAnEasyCoder <105108698+HesAnEasyCoder@users.noreply.github.com>
Date: Mon, 6 May 2024 16:45:32 -0700
Subject: [PATCH 4/7] Update dfp-model-card.md
### Describe access restrictions
* The Principle of least privilege (PoLP) is applied limiting access for dataset generation and model development. Restrictions enforce dataset access during training, and dataset license constraints adhered to."
---
models/model-cards/dfp-model-card.md | 6 ++----
1 file changed, 2 insertions(+), 4 deletions(-)
diff --git a/models/model-cards/dfp-model-card.md b/models/model-cards/dfp-model-card.md
index 420ceabfe0..a049cf18fa 100644
--- a/models/model-cards/dfp-model-card.md
+++ b/models/model-cards/dfp-model-card.md
@@ -205,11 +205,9 @@ The evaluation dataset consists of AWS CloudTrail logs. It contains logs from tw
### Is the model and dataset compliant with National Classification Management Society (NCMS)?
* No
-### Are there explicit model and dataset restrictions?
-* No
+### Describe access restrictions
-### Are there access restrictions to systems, model, and data?
-* No
+* The Principle of least privilege (PoLP) is applied limiting access for dataset generation and model development. Restrictions enforce dataset access during training, and dataset license constraints adhered to.
### Is there a digital signature?
* No
From 9108041fbbf51c3ee790de76b547a8e19cf0cf6c Mon Sep 17 00:00:00 2001
From: HesAnEasyCoder <105108698+HesAnEasyCoder@users.noreply.github.com>
Date: Mon, 6 May 2024 16:45:59 -0700
Subject: [PATCH 5/7] Update gnn-fsi-model-card.md
### Describe access restrictions
* The Principle of least privilege (PoLP) is applied limiting access for dataset generation and model development. Restrictions enforce dataset access during training, and dataset license constraints adhered to."
---
models/model-cards/gnn-fsi-model-card.md | 6 ++----
1 file changed, 2 insertions(+), 4 deletions(-)
diff --git a/models/model-cards/gnn-fsi-model-card.md b/models/model-cards/gnn-fsi-model-card.md
index ae76cd8edd..84ce630c55 100644
--- a/models/model-cards/gnn-fsi-model-card.md
+++ b/models/model-cards/gnn-fsi-model-card.md
@@ -169,11 +169,9 @@ NVIDIA believes Trustworthy AI is a shared responsibility and we have establishe
### Is the model and dataset compliant with National Classification Management Society (NCMS)?
* Not Applicable
-### Are there explicit model and dataset restrictions?
-* No
+### Describe access restrictions
-### Are there access restrictions to systems, model, and data?
-* No
+* The Principle of least privilege (PoLP) is applied limiting access for dataset generation and model development. Restrictions enforce dataset access during training, and dataset license constraints adhered to.
### Is there a digital signature?
* No
From 4e962d0b21cb4a712744e2ed052f065c3e848642 Mon Sep 17 00:00:00 2001
From: HesAnEasyCoder <105108698+HesAnEasyCoder@users.noreply.github.com>
Date: Mon, 6 May 2024 16:46:20 -0700
Subject: [PATCH 6/7] Update phishing-model-card.md
### Describe access restrictions
* The Principle of least privilege (PoLP) is applied limiting access for dataset generation and model development. Restrictions enforce dataset access during training, and dataset license constraints adhered to."
---
models/model-cards/phishing-model-card.md | 6 ++----
1 file changed, 2 insertions(+), 4 deletions(-)
diff --git a/models/model-cards/phishing-model-card.md b/models/model-cards/phishing-model-card.md
index 7699c256b2..a902a3fde5 100644
--- a/models/model-cards/phishing-model-card.md
+++ b/models/model-cards/phishing-model-card.md
@@ -204,11 +204,9 @@ NVIDIA believes Trustworthy AI is a shared responsibility and we have establishe
### Is the model and dataset compliant with National Classification Management Society (NCMS)?
* No
-### Are there explicit model and dataset restrictions?
-* No
+### Describe access restrictions
-### Are there access restrictions to systems, model, and data?
-* No
+* The Principle of least privilege (PoLP) is applied limiting access for dataset generation and model development. Restrictions enforce dataset access during training, and dataset license constraints adhered to.
### Is there a digital signature?
From 4f2b6c88f645ccb89b4531546ee0d8e0e4611a2b Mon Sep 17 00:00:00 2001
From: HesAnEasyCoder <105108698+HesAnEasyCoder@users.noreply.github.com>
Date: Mon, 6 May 2024 16:46:50 -0700
Subject: [PATCH 7/7] Update root-cause-analysis-model-card.md
### Describe access restrictions
* The Principle of least privilege (PoLP) is applied limiting access for dataset generation and model development. Restrictions enforce dataset access during training, and dataset license constraints adhered to."
---
models/model-cards/root-cause-analysis-model-card.md | 6 ++----
1 file changed, 2 insertions(+), 4 deletions(-)
diff --git a/models/model-cards/root-cause-analysis-model-card.md b/models/model-cards/root-cause-analysis-model-card.md
index bd8c301faf..064019756b 100644
--- a/models/model-cards/root-cause-analysis-model-card.md
+++ b/models/model-cards/root-cause-analysis-model-card.md
@@ -158,11 +158,9 @@ limitations under the License.
### Is the model and dataset compliant with National Classification Management Society (NCMS)?
* No
-### Are there explicit model and dataset restrictions?
-* It is for pipeline testing purposes.
+### Describe access restrictions
-### Are there access restrictions to systems, model, and data?
-* No
+* The Principle of least privilege (PoLP) is applied limiting access for dataset generation and model development. Restrictions enforce dataset access during training, and dataset license constraints adhered to.
### Is there a digital signature?
* No