-
Notifications
You must be signed in to change notification settings - Fork 145
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Remove Starter Digital Fingerprinting (DFP) (#1903)
- Remove all references to the Starter DFP in docs - Remove classes which only exist for the Starter DFP. - Remove tests and associated test data for the Starter DFP - Remove Starter DFP from CLI Closes #1715 Closes #1713 Closes #1641 ## By Submitting this PR I confirm: - I am familiar with the [Contributing Guidelines](https://github.com/nv-morpheus/Morpheus/blob/main/docs/source/developer_guide/contributing.md). - When the PR is ready for review, new or existing tests cover these changes. - When the PR is ready for review, the documentation is up to date with these changes. Authors: - Eli Fajardo (https://github.com/efajardo-nv) Approvers: - David Gardner (https://github.com/dagardner-nv) URL: #1903
- Loading branch information
1 parent
a2949af
commit 1003ce1
Showing
40 changed files
with
33 additions
and
2,683 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -23,7 +23,7 @@ Every account, user, service, and machine has a digital fingerprint that represe | |
To construct this digital fingerprint, we will be training unsupervised behavioral models at various granularities, including a generic model for all users in the organization along with fine-grained models for each user to monitor their behavior. These models are continuously updated and retrained over time, and alerts are triggered when deviations from normality occur for any user. | ||
|
||
## Training Sources | ||
The data we will want to use for the training and inference will be any sensitive system that the user interacts with, such as VPN, authentication and cloud services. The digital fingerprinting example (`examples/digital_fingerprinting/README.md`) included in Morpheus ingests logs from [AWS CloudTrail](https://docs.aws.amazon.com/cloudtrail/index.html), [Azure Active Directory](https://docs.microsoft.com/en-us/azure/active-directory/reports-monitoring/concept-sign-ins), and [Duo Authentication](https://duo.com/docs/adminapi). | ||
The data we will want to use for the training and inference will be any sensitive system that the user interacts with, such as VPN, authentication and cloud services. The digital fingerprinting example (`examples/digital_fingerprinting/README.md`) included in Morpheus ingests logs from [Azure Active Directory](https://docs.microsoft.com/en-us/azure/active-directory/reports-monitoring/concept-sign-ins), and [Duo Authentication](https://duo.com/docs/adminapi). | ||
|
||
The location of these logs could be either local to the machine running Morpheus, a shared file system like NFS, or on a remote store such as [Amazon S3](https://aws.amazon.com/s3/). | ||
|
||
|
@@ -44,54 +44,23 @@ Adding a new source for the DFP pipeline requires defining five critical pieces: | |
1. A [`DataFrameInputSchema`](6_digital_fingerprinting_reference.md#dataframe-input-schema-dataframeinputschema) for the [`DFPFileToDataFrameStage`](6_digital_fingerprinting_reference.md#file-to-dataframe-stage-dfpfiletodataframestage) stage. | ||
1. A [`DataFrameInputSchema`](6_digital_fingerprinting_reference.md#dataframe-input-schema-dataframeinputschema) for the [`DFPPreprocessingStage`](6_digital_fingerprinting_reference.md#preprocessing-stage-dfppreprocessingstage). | ||
|
||
## DFP Examples | ||
The DFP workflow is provided as two separate examples: a simple, "starter" pipeline for new users and a complex, "production" pipeline for full scale deployments. While these two examples both perform the same general tasks, they do so in very different ways. The following is a breakdown of the differences between the two examples. | ||
|
||
### The "Starter" Example | ||
|
||
This example is designed to simplify the number of stages and components and provide a fully contained workflow in a single pipeline. | ||
|
||
Key Differences: | ||
* A single pipeline which performs both training and inference | ||
* Requires no external services | ||
* Can be run from the Morpheus CLI | ||
|
||
This example is described in more detail in `examples/digital_fingerprinting/starter/README.md`. | ||
|
||
### The "Production" Example | ||
## Production Deployment Example | ||
|
||
This example is designed to illustrate a full-scale, production-ready, DFP deployment in Morpheus. It contains all of the necessary components (such as a model store), to allow multiple Morpheus pipelines to communicate at a scale that can handle the workload of an entire company. | ||
|
||
Key Differences: | ||
Key Features: | ||
* Multiple pipelines are specialized to perform either training or inference | ||
* Requires setting up a model store to allow the training and inference pipelines to communicate | ||
* Uses a model store to allow the training and inference pipelines to communicate | ||
* Organized into a docker-compose deployment for easy startup | ||
* Contains a Jupyter notebook service to ease development and debugging | ||
* Can be deployed to Kubernetes using provided Helm charts | ||
* Uses many customized stages to maximize performance. | ||
|
||
This example is described in `examples/digital_fingerprinting/production/README.md` as well as the rest of this document. | ||
|
||
### DFP Features | ||
## DFP Features | ||
|
||
#### AWS CloudTrail | ||
| Feature | Description | | ||
| ------- | ----------- | | ||
| `userIdentityaccessKeyId` | for example, `ACPOSBUM5JG5BOW7B2TR`, `ABTHWOIIC0L5POZJM2FF`, `AYI2CM8JC3NCFM4VMMB4` | | ||
| `userAgent` | for example, `Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 10.0; Trident/5.1)`, `Mozilla/5.0 (Linux; Android 4.3.1) AppleWebKit/536.1 (KHTML, like Gecko) Chrome/62.0.822.0 Safari/536.1`, `Mozilla/5.0 (Macintosh; U; PPC Mac OS X 10 7_0; rv:1.9.4.20) Gecko/2012-06-10 12:09:43 Firefox/3.8` | | ||
| `userIdentitysessionContextsessionIssueruserName` | for example, `role-g` | | ||
| `sourceIPAddress` | for example, `208.49.113.40`, `123.79.131.26`, `128.170.173.123` | | ||
| `userIdentityaccountId` | for example, `Account-123456789` | | ||
| `errorMessage` | for example, `The input fails to satisfy the constraints specified by an AWS service.`, `The specified subnet cannot be found in the VPN with which the Client VPN endpoint is associated.`, `Your account is currently blocked. Contact [email protected] if you have questions.` | | ||
| `userIdentitytype` | for example, `FederatedUser` | | ||
| `eventName` | for example, `GetSendQuota`, `ListTagsForResource`, `DescribeManagedPrefixLists` | | ||
| `userIdentityprincipalId` | for example, `39c71b3a-ad54-4c28-916b-3da010b92564`, `0baf594e-28c1-46cf-b261-f60b4c4790d1`, `7f8a985f-df3b-4c5c-92c0-e8bffd68abbf` | | ||
| `errorCode` | for example, success, `MissingAction`, `ValidationError` | | ||
| `eventSource` | for example, `lopez-byrd.info`, `robinson.com`, `lin.com` | | ||
| `userIdentityarn` | for example, `arn:aws:4a40df8e-c56a-4e6c-acff-f24eebbc4512`, `arn:aws:573fd2d9-4345-487a-9673-87de888e4e10`, `arn:aws:c8c23266-13bb-4d89-bce9-a6eef8989214` | | ||
| `apiVersion` | for example, `1984-11-26`, `1990-05-27`, `2001-06-09` | | ||
|
||
#### Azure Active Directory | ||
### Azure Active Directory | ||
| Feature | Description | | ||
| ------- | ----------- | | ||
| `appDisplayName` | for example, `Windows sign in`, `MS Teams`, `Office 365` | | ||
|
@@ -104,14 +73,14 @@ This example is described in `examples/digital_fingerprinting/production/README. | |
| `location.countryOrRegion` | country or region name | | ||
| `location.city` | city name | | ||
|
||
##### Derived Features | ||
#### Derived Features | ||
| Feature | Description | | ||
| ------- | ----------- | | ||
| `logcount` | tracks the number of logs generated by a user within that day (increments with every log) | | ||
| `locincrement` | increments every time we observe a new city (`location.city`) in a user's logs within that day | | ||
| `appincrement` | increments every time we observe a new app (`appDisplayName`) in a user's logs within that day | | ||
|
||
#### Duo Authentication | ||
### Duo Authentication | ||
| Feature | Description | | ||
| ------- | ----------- | | ||
| `auth_device.name` | phone number | | ||
|
@@ -121,7 +90,7 @@ This example is described in `examples/digital_fingerprinting/production/README. | |
| `reason` | reason for the results, for example, `User Cancelled`, `User Approved`, `User Mistake`, `No Response` | | ||
| `access_device.location.city` | city name | | ||
|
||
##### Derived Features | ||
#### Derived Features | ||
| Feature | Description | | ||
| ------- | ----------- | | ||
| `logcount` | tracks the number of logs generated by a user within that day (increments with every log) | | ||
|
@@ -133,16 +102,16 @@ DFP in Morpheus is accomplished via two independent pipelines: training and infe | |
|
||
![High Level Architecture](img/dfp_high_level_arch.png) | ||
|
||
#### Training Pipeline | ||
### Training Pipeline | ||
* Trains user models and uploads to the model store | ||
* Capable of training individual user models or a fallback generic model for all users | ||
|
||
#### Inference Pipeline | ||
### Inference Pipeline | ||
* Downloads user models from the model store | ||
* Generates anomaly scores per log | ||
* Sends detected anomalies to monitoring services | ||
|
||
#### Monitoring | ||
### Monitoring | ||
* Detected anomalies are published to an S3 bucket, directory or a Kafka topic. | ||
* Output can be integrated with a monitoring tool. | ||
|
||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.