[DOC]: DFP Starter Example README.md Needs to be Updated #1713

oguzhancelik2425 · 2024-05-20T18:17:52Z

How would you describe the priority of this documentation request

High

Please provide a link or source to the relevant docs

https://github.com/nv-morpheus/Morpheus/blob/branch-24.06/examples/digital_fingerprinting/starter/README.md

Describe the problems in the documentation

In this file there are 3 given examples for the DFP pipeline which includes cloudtrail, azure aad, and duo message log use cases. However, when I deep dive into the documentation for running examples, the explanation is somehow poor. In the examples of the uses cases above there are some training and validation dataset samples. When I check for the cloudtrail input validation sample dataset, it already has the ae_anomaly_score and ts_anomaly values, but those are calculated by the model should not be in the training validation dataset I assume. Moreover, when I check the train ae model stage, I cannot see the ae_anomlay_score and ts_anomaly calculation, but these are in the hammah-inference.py module in the /models/validation-scripts/dfp-models/ directory. From those information there is no easy way to understand what is the order, or running the pipeline and scripts, what are the training, validation and inference files, and how those anomaly scores are being calculated?

On the other hand, I realized that some directory paths are not updated yet which causes errors in the CLI. For example: the CLI example here shows that --columns_file=morpheus/data/columns_ae_cloudtrail.txt option should read the feature columns txt file but there is no path from /morpheus/ instead it should be models/data/columns_ae_cloudtrail.txt. Similar problem exists in the rest of the README.md file.

(Optional) Propose a correction

I think this tarted example needs to be reviewed in detail and should be given more details to understand how this model works, what resources need to be used for the given examples and what is the order to run. In my current example that I tried to explain above, I understand that user should run the hammah-20211017.ipynb notebook first right after feature selection with the training dataset, then trained the model here in order to get the ae_anomaly_score and ts_anomaly values via running hammah-inference.py module. Then, the user needs to run the dfp pipeline CLI based on the use case's features with the file which is overridden by the previous hammah-inference.py, in order to get reconstruction_loss, z_loss, scores values etc. If I am wrong it is because of poor documentation of the example, unfortunately.

Code of Conduct

I agree to follow this project's Code of Conduct
I have searched the open documentation issues and have found no duplicates for this bug report

The text was updated successfully, but these errors were encountered:

efajardo-nv · 2024-05-21T16:20:06Z

@oguzhancelik2425 Thank you for submitting this issue. The Starter DFP implementation and documentation are somewhat stale as we decided to focus on single DFP implementation, Production DFP. The two implementations have diverged significantly so we have been encouraging everyone to now start with Production DFP which comes with a bit more complexity but incorporates new Morpheus features and is more scalable.

We plan on having Starter DFP removed for the next release (#1715).

oguzhancelik2425 · 2024-05-21T16:34:41Z

@oguzhancelik2425 Thank you for submitting this issue. The Starter DFP implementation and documentation are somewhat stale as we decided to focus on single DFP implementation, Production DFP. The two implementations have diverged significantly so we have been encouraging everyone to now start with Production DFP which comes with a bit more complexity but incorporates new Morpheus features and is more scalable.

We plan on having Starter DFP removed for the next release (#1715).

Thanks for the quick response @efajardo-nv. I wonder about the part that returns the anomaly scores in the hammah_inference.py, it shows index 3 to get some values but If I use it, I am getting uniform values for the inference dataset. Could this module be outdated/stale as well like the rest of the starter pipeline? When I removed the [3] from the line I got different anomaly scores for my inference. I wonder is this line mistakenly written or so?

efajardo-nv · 2024-05-21T18:12:30Z

I wonder about the part that returns the anomaly scores in the hammah_inference.py, it shows index 3 to get some values but If I use it, I am getting uniform values for the inference dataset. Could this module be outdated/stale as well like the rest of the starter pipeline? When I removed the [3] from the line I got different anomaly scores for my inference. I wonder is this line mistakenly written or so?

@oguzhancelik2425 You are correct. That inference script which is run independent of Morpheus has not been kept in sync with the changes in the autoencoder and data file paths. We'll get that updated. Thanks!

oguzhancelik2425 added the doc Improvements or additions to documentation label May 20, 2024

github-project-automation bot added this to Morpheus Boards May 20, 2024

github-project-automation bot moved this to Todo in Morpheus Boards May 20, 2024

mdemoret-nv added the external This issue was filed by someone outside of the Morpheus team label Jun 11, 2024

efajardo-nv mentioned this issue Sep 20, 2024

Remove Starter Digital Fingerprinting (DFP) #1903

Merged

morpheus-bot-test bot moved this from Todo to Review - Ready for Review in Morpheus Boards Sep 20, 2024

dagardner-nv closed this as completed in 1003ce1 Oct 24, 2024

github-project-automation bot moved this from Review - Ready for Review to Done in Morpheus Boards Oct 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[DOC]: DFP Starter Example README.md Needs to be Updated #1713

[DOC]: DFP Starter Example README.md Needs to be Updated #1713

oguzhancelik2425 commented May 20, 2024

efajardo-nv commented May 21, 2024 •

edited

Loading

oguzhancelik2425 commented May 21, 2024

efajardo-nv commented May 21, 2024

[DOC]: DFP Starter Example README.md Needs to be Updated #1713

[DOC]: DFP Starter Example README.md Needs to be Updated #1713

Comments

oguzhancelik2425 commented May 20, 2024

How would you describe the priority of this documentation request

Please provide a link or source to the relevant docs

Describe the problems in the documentation

(Optional) Propose a correction

Code of Conduct

efajardo-nv commented May 21, 2024 • edited Loading

oguzhancelik2425 commented May 21, 2024

efajardo-nv commented May 21, 2024

efajardo-nv commented May 21, 2024 •

edited

Loading