Name	Name	Last commit message	Last commit date
parent directory ..
dataset	dataset
model	model
training-inference	training-inference
README.md	README.md
requirements.txt	requirements.txt

Cyber Foundation

Model Overview

Description:

This model is a GPT model trained to generate synthetic Azure logs. This approach can be used to generate logs that are realistic for some downstream tasks, i.e. generating training data as a baseline, generating attack behavior to test detectors.

Requirements:

To run this example, additional requirements must be installed into your environment. A supplementary requirements file has been provided in this example directory.

pip install -r requirements.txt

References(s):

https://github.com/karpathy/nanoGPT

Model Architecture:

Architecture Type:

Transformer

Network Architecture:

Input: (Enter "None" As Needed)

Input Format:

JSON

Input Parameters:

Azure AD Logs

Other Properties Related to Output:

Output: (Enter "None" As Needed)

Output Format:

Text file with synthetic logs

Output Parameters:

Other Properties Related to Output:

Software Integration:

Runtime(s):

Morpheus

Supported Hardware Platform(s):

Ampere/Turing

Supported Operating System(s):

Linux

Model Version(s):

Training & Evaluation:

Training Dataset:

Link:

https://github.com/nv-morpheus/Morpheus/blob/main/models/datasets/training-data/azure/azure-ad-logs-sample-training-data.json

Properties (Quantity, Dataset Descriptions, Sensor(s)):

3239 Azure AD logs

Dataset License:

Apache 2.0

Evaluation Dataset:

Link:

Properties (Quantity, Dataset Descriptions, Sensor(s)):

Dataset License:

Inference:

Engine:

Test Hardware:

A100

Subcards

Model Card ++ Bias Subcard

What is the gender balance of the model validation data?

Not Applicable

What is the racial/ethnicity balance of the model validation data?

Not Applicable

What is the age balance of the model validation data?

Not Applicable

What is the language balance of the model validation data?

English: 100%

What is the geographic origin language balance of the model validation data?

Not Applicable

What is the educational background balance of the model validation data?

Not Applicable

What is the accent balance of the model validation data?

Not Applicable

What is the face/key point balance of the model validation data?

Not Applicable

What is the skin/tone balance of the model validation data?

Not Applicable

What is the religion balance of the model validation data?

Not Applicable

Individuals from the following adversely impacted (protected classes) groups participate in model design and testing.

Not Applicable

Describe measures taken to mitigate against unwanted bias.

Not Applicable

Model Card ++ Explainability Subcard

Name example applications and use cases for this model.

The model is primarily designed for testing purposes and serves as a small pre-trained model used to generate Azure AD logs.

Fill in the blank for the model technique.

This model is intended for developers who want to build GPT based synthetic log generator

Name who is intended to benefit from this model.

The intended beneficiaries of this model are developers who aim to generate synthetic Azure logs.

Describe the model output.

This model output is synthetic Azure AD logs.

List the steps explaining how this model works.

This model is an example of a GPT model. This model requires raw log messages as input for training and a prompt for inference. The model is trained as in the training notebook. During inference, the trained model is prompted with the first key of the log type and generates synthetic logs.

Name the adversely impacted groups (protected classes) this has been tested to deliver comparable outcomes regardless of:

Not Applicable

List the technical limitations of the model.

This model is trained with synthetic logs for demonstration purposes. A separate training is needed for other logs.

What performance metrics were used to affirm the model's performance?

Intact raw logs

What are the potential known risks to users and stakeholders?

What training is recommended for developers working with this model?

None

Link the relevant end user license agreement

Apache 2.0

Model Card ++ Saftey & Security Subcard

Link the location of the training dataset's repository.

https://github.com/nv-morpheus/Morpheus/blob/main/models/datasets/training-data/azure/azure-ad-logs-sample-training-data.json

Is the model used in an application with physical safety impact?

Describe physical safety impact (if present).

Was model and dataset assessed for vulnerability for potential form of attack?

Name applications for the model.

This model is provided as an example of synthetic log generation. Users can create their own models for their use cases and downstream tasks.

Name use case restrictions for the model.

It's been trained with a small dataset for mainly demonstration purposes.

Has this been verified to have met prescribed quality standards?

Name target quality Key Performance Indicators (KPIs) for which this has been tested.

Technical robustness and model security validated?

Is the model and dataset compliant with National Classification Management Society (NCMS)?

Are there explicit model and dataset restrictions?

Are there access restrictions to systems, model, and data?

Is there a digital signature?

Model Card ++ Privacy Subcard

Generatable or reverse engineerable personally-identifiable information (PII)?

Neither

Was consent obtained for any PII used?

Protected classes used to create this model? (The following were used in model the model's training:)

How often is dataset reviewed?

The dataset is initially reviewed upon addition, and subsequent reviews are conducted as needed or upon request for any changes.

Files

cyber-foundation

Directory actions

More options