Skip to content

Commit

Permalink
Merge branch 'branch-24.06' into david-group-by-column_stage
Browse files Browse the repository at this point in the history
  • Loading branch information
dagardner-nv authored May 17, 2024
2 parents faa973d + ee9d932 commit 76a9856
Show file tree
Hide file tree
Showing 5 changed files with 21 additions and 101 deletions.
6 changes: 2 additions & 4 deletions models/model-cards/abp-model-card.md
Original file line number Diff line number Diff line change
Expand Up @@ -160,11 +160,9 @@ NVIDIA believes Trustworthy AI is a shared responsibility and we have establishe
### Is the model and dataset compliant with National Classification Management Society (NCMS)?
* No

### Are there explicit model and dataset restrictions?
* No
### Describe access restrictions

### Are there access restrictions to systems, model, and data?
* No
* The Principle of least privilege (PoLP) is applied limiting access for dataset generation and model development. Restrictions enforce dataset access during training, and dataset license constraints adhered to.

### Is there a digital signature?
* No
Expand Down
18 changes: 5 additions & 13 deletions models/model-cards/dfp-model-card.md
Original file line number Diff line number Diff line change
Expand Up @@ -107,21 +107,18 @@ NVIDIA believes Trustworthy AI is a shared responsibility and we have establishe

## Model Card ++ Bias Subcard

### What is the language balance of the model validation data?
* English (cloudtrail logs): 100%

### Individuals from the following adversely impacted (protected classes) groups participate in model design and testing.
* None of the Above.
* None of the Above.

### Describe measures taken to mitigate against unwanted bias.
* None of the Above.
* None of the Above.

## Model Card ++ Explainability Subcard

### Name example applications and use cases for this model.
* The model is primarily designed for testing purposes and serves as a small pretrained model specifically used to evaluate and validate the DFP pipeline. Its application is focused on assessing the effectiveness of the pipeline rather than being intended for broader use cases or specific applications beyond testing.

### Fill in the blank for the model technique.
### Intended Users.
* This model is designed for developers seeking to test the DFP pipeline with a small pretrained model trained on a synthetic dataset.

### Name who is intended to benefit from this model.
Expand All @@ -131,14 +128,14 @@ NVIDIA believes Trustworthy AI is a shared responsibility and we have establishe
* The model calculates an anomaly score for each input based on the reconstruction loss obtained from the trained Autoencoder. This score represents the level of anomaly detected in the input data. Higher scores indicate a higher likelihood of anomalous behavior.
* The model provides the reconstruction loss of each feature to facilitate further testing and debugging of the pipeline.

### List the steps explaining how this model works.
### Describe how this model works.
* The model works by training on baseline behaviors and subsequently detecting deviations from the established baseline, triggering alerts accordingly.
* [Training notebook](https://github.com/nv-morpheus/Morpheus/blob/branch-24.06/models/training-tuning-scripts/dfp-models/hammah-20211017.ipynb)

### List the technical limitations of the model.
* The model expects cloudtrail logs with specific features that match the training dataset. Data lacking the required features or requiring a different feature set may not be compatible with the model.

### Has this been verified to have met prescribed quality standards?
### Has this been verified to have met prescribed NVIDIA quality standards?
* Yes

### What performance metrics were used to affirm the model's performance?
Expand Down Expand Up @@ -170,13 +167,8 @@ NVIDIA believes Trustworthy AI is a shared responsibility and we have establishe
### Name explicit model and/or dataset restrictions.
* The Principle of least privilege (PoLP) is applied limiting access for dataset generation and model development.

### Are there access restrictions to systems, model, and data?
* No


## Model Card ++ Privacy Subcard


### Generatable or reverse engineerable personally-identifiable information (PII)?
* None

Expand Down
6 changes: 2 additions & 4 deletions models/model-cards/gnn-fsi-model-card.md
Original file line number Diff line number Diff line change
Expand Up @@ -159,11 +159,9 @@ NVIDIA believes Trustworthy AI is a shared responsibility and we have establishe
### Is the model and dataset compliant with National Classification Management Society (NCMS)?
* Not Applicable

### Are there explicit model and dataset restrictions?
* No
### Describe access restrictions

### Are there access restrictions to systems, model, and data?
* No
* The Principle of least privilege (PoLP) is applied limiting access for dataset generation and model development. Restrictions enforce dataset access during training, and dataset license constraints adhered to.

### Is there a digital signature?
* No
Expand Down
6 changes: 2 additions & 4 deletions models/model-cards/phishing-model-card.md
Original file line number Diff line number Diff line change
Expand Up @@ -168,11 +168,9 @@ NVIDIA believes Trustworthy AI is a shared responsibility and we have establishe
### Is the model and dataset compliant with National Classification Management Society (NCMS)?
* No

### Are there explicit model and dataset restrictions?
* No
### Describe access restrictions

### Are there access restrictions to systems, model, and data?
* No
* The Principle of least privilege (PoLP) is applied limiting access for dataset generation and model development. Restrictions enforce dataset access during training, and dataset license constraints adhered to.

### Is there a digital signature?

Expand Down
86 changes: 10 additions & 76 deletions models/model-cards/root-cause-analysis-model-card.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,63 +21,49 @@ limitations under the License.
# Model Overview

## Description:

* Root cause analysis is a binary classifier differentiating between ordinary logs and errors/problems/root causes in the log files. <br>

## References(s):

* Devlin J. et al. (2018), BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding https://arxiv.org/abs/1810.04805 <br>

## Model Architecture:

**Architecture Type:**

* Transformers <br>

**Network Architecture:**

* BERT <br>

## Input: (Enter "None" As Needed)

**Input Format:**

* CSV <br>

**Input Parameters:**

* kern.log file contents <br>

**Other Properties Related to Output:**

* N/A <br>

## Output: (Enter "None" As Needed)

**Output Format:**

* Binary Results, Root Cause or Ordinary <br>

**Output Parameters:**

* N/A <br>

**Other Properties Related to Output:**

* N/A <br>

## Software Integration:

**Runtime(s):**

* Morpheus <br>

**Supported Hardware Platform(s):** <br>

* Ampere/Turing <br>

**Supported Operating System(s):** <br>

* Linux <br>

## Model Version(s):
Expand All @@ -88,67 +74,31 @@ limitations under the License.
## Training Dataset:

**Link:**

* https://github.com/nv-morpheus/Morpheus/blob/branch-24.06/models/datasets/training-data/root-cause-training-data.csv <br>

**Properties (Quantity, Dataset Descriptions, Sensor(s)):**

* kern.log files from DGX machines <br>

## Evaluation Dataset:

**Link:**

* https://github.com/nv-morpheus/Morpheus/blob/branch-24.06/models/datasets/validation-data/root-cause-validation-data-input.jsonlines <br>

**Properties (Quantity, Dataset Descriptions, Sensor(s)):**

* kern.log files from DGX machines <br>

## Inference:

**Engine:**

* Triton <br>

**Test Hardware:** <br>

* Other <br>

# Subcards

## Model Card ++ Bias Subcard

### What is the gender balance of the model validation data?
* Not Applicable

### What is the racial/ethnicity balance of the model validation data?
* Not Applicable

### What is the age balance of the model validation data?
* Not Applicable

### What is the language balance of the model validation data?
* Not Applicable

### What is the geographic origin language balance of the model validation data?
* Not Applicable

### What is the educational background balance of the model validation data?
* Not Applicable

### What is the accent balance of the model validation data?
* Not Applicable

### What is the face/key point balance of the model validation data?
* Not Applicable

### What is the skin/tone balance of the model validation data?
* Not Applicable

### What is the religion balance of the model validation data?
* Not Applicable

### Individuals from the following adversely impacted (protected classes) groups participate in model design and testing.
* Not Applicable

Expand All @@ -160,26 +110,24 @@ limitations under the License.
### Name example applications and use cases for this model.
* The model is primarily designed for testing purposes and serves as a small pre-trained model specifically used to evaluate and validate the Root Cause Analysis pipeline. This model is an example of customized transformer-based root cause analysis. It can be used for pipeline testing purposes. It needs to be re-trained for specific root cause analysis or predictive maintenance needs with the fine-tuning scripts in the repo. The hyperparameters can be optimised to adjust to get the best results with another dataset. The aim is to get the model to predict some false positives that could be previously unknown error types. Users can use this root cause analysis approach with other log types too. If they have known failures in their logs, they can use them to train along with ordinary logs and can detect other root causes they weren't aware of before.

### Fill in the blank for the model technique.

### Intended Users.
* This model is designed for developers seeking to test the root cause analysis pipeline with a small pre-trained model trained on a very small `kern.log` file from a DGX.

### Name who is intended to benefit from this model.

* The intended beneficiaries of this model are developers who aim to test the functionality of the DFP pipeline using synthetic datasets

### Describe the model output.
* This model output can be used as a binary result, Root cause or Ordinary

### List the steps explaining how this model works.
### Describe how this model works.
* A BERT model gets fine-tuned with the kern.log dataset and in the inference it predicts one of the binary classes. Root cause or Ordinary.

### Name the adversely impacted groups (protected classes) this has been tested to deliver comparable outcomes regardless of:
* Not Applicable

### List the technical limitations of the model.
* For different log types and content, different models need to be trained.

### Has this been verified to have met prescribed NVIDIA quality standards?
* Yes

### What performance metrics were used to affirm the model's performance?
* F1

Expand All @@ -195,10 +143,7 @@ limitations under the License.
### Link the location of the training dataset's repository.
* https://github.com/nv-morpheus/Morpheus/blob/branch-24.06/models/datasets/training-data/root-cause-training-data.csv

### Is the model used in an application with physical safety impact?
* No

### Describe physical safety impact (if present).
### Describe the life critical impact (if present).
* None

### Was model and dataset assessed for vulnerability for potential form of attack?
Expand All @@ -210,20 +155,12 @@ limitations under the License.
### Name use case restrictions for the model.
* Different models need to be trained depending on the log types.

### Has this been verified to have met prescribed quality standards?
* No

### Name target quality Key Performance Indicators (KPIs) for which this has been tested.
* N/A

### Is the model and dataset compliant with National Classification Management Society (NCMS)?
* No

### Are there explicit model and dataset restrictions?
* It is for pipeline testing purposes.
### Describe access restrictions

### Are there access restrictions to systems, model, and data?
* No
* The Principle of least privilege (PoLP) is applied limiting access for dataset generation and model development. Restrictions enforce dataset access during training, and dataset license constraints adhered to.

### Is there a digital signature?
* No
Expand All @@ -232,7 +169,7 @@ limitations under the License.


### Generatable or reverse engineerable personally-identifiable information (PII)?
* Neither
* None

### Was consent obtained for any PII used?
* N/A
Expand All @@ -249,12 +186,9 @@ limitations under the License.
### If PII collected for the development of this AI model, was it minimized to only what was required?
* N/A

### Is data in dataset traceable?
### Is there data provenance?
* Original raw logs are not saved. The small sample in the repo is saved for testing the pipeline.

### Are we able to identify and trace source of dataset?
* N/A

### Does data labeling (annotation, metadata) comply with privacy laws?
* N/A

Expand Down

0 comments on commit 76a9856

Please sign in to comment.